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DB=USPT; PLUR=YES; OP=AND 

□ LI atp near4 transport 315 

□ L2 LI same (gram near4 positive) 7 
DB=PGPB,USPT,USOC,EPAB,JPAB,DWPIJDBD; PLUR=YES; OP=AND 

□ L3 abc.ti,ab,clm. 5947 
— L4 L3 same (transports or system$2 or cassett$2 or complex$2 or 985 

operon$2).ti,ab,clm. 

□ L5 L4 and (strep$ or staph$ or (gram near3 positive)) 73 
DB=USPT; PLUR=YES; OP=AND 

□ L6 US-6251629-Bl.did. 1 
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Search Results - Record(s) 51 through 73 of 73 returned. 

□ 51. 6077826 . 08 Jun 98; 20 Jun 00. Synthetic macromolecular channel assembly for transport of 
chloride ions through epithelium useful in treating cystic fibrosis. Tomich; John M., et al. 514/12; 
514/13 530/324 530/325 530/326. C07K014/00 A61K038/16. 



□ 52. 5773277 . 18 May 95; 30 Jun 98. Crystalline chondroitinase isolated from Proteus vulgaris 
ATCC 6896. Hashimoto; Nobukazu, et al. 435/232; 435/183 435/188 435/201. C12N009/88 
C12N009/44. 



□ 53. JP41125 3178 A . 07 Oct 98. 21 Sep 99. ABC TRANSPORTER . WARREN, RICHARD L. 
C12N015/09; C07K014/31 C07K016/12 C12P021/02 C12Q001/68 G01N033/53 G01N033/577 
A61K031/00 A61K038/00 A61K048/00. 



□ 54. WO003087147A1 . 14 Apr 03. 23 Oct 03. STRETPTOCOCCAL GENES INVOLVED IN 
OSMOTIC AND OXIDATIVE STRESS AND IN VIRULENCE. BROWN, JEREMY STUART. 
C07K014/315; A61K035/74 C07K016/12. 



r 55. WO009957281A2 . 06 May 99. 1 1 Nov 99. PEPTIDE ANTIBIOTICS WHICH INHIBIT THE 
GROWTH OF PNEUMOCOCCI, ABC TRANSPORTER AND TWO-COMPONENT SIGNAL 
TRANSDUCTION SYSTEM PROTEINS FROM STREPTOCOCCUS PNEUMONIAE, AND 
METHODS USING THE SAME. NOVAK, RODGER, et al. C12N015/31; C12N015/54 C12N015/55 
C12N009/12 C12N009/14 C07K007/08 C07K014/315 C07K016/12 C12Q001/68 A61K039/09. 



D 56. WO009801154A2 . 07 Jul 97. 15 Jan 98. TREATMENT AND DIAGNOSIS OF INFECTIONS 
OF GRAM POSITIVE COCCI. BURNIE, JAMES PETER, et al. A61K039/00; A61K039/02 
A61K039/09 A61K039/40 G01N033/569 C07K014/31 C07K016/12 A61K039/085. 



□ 57. WO2003087147A . Peptides derived from phg ABC operon. useful for the manufacture of a 
medicament for treating or preventing a condition associated with infection by Streptococcus 
pneumoniae or other gram-positive bacteria. BROWN, J S. A61K035/74 C07K014/315 C07K016/12. 

□ 58. US200301 18992A . A new ABC transporter polypeptide from Staphylococcus aureus is useful 
to diagnose, prevent or treat microbial infections and diseases. WARREN, R L. C07H021/04 
C07K014/47 C12N005/06 C12P021/02 C12Q001/68 G01N033/48 G01N033/50 G06F019/00. 



I - 59. WO2003037310A . Composition useful for treating microbial infections, e.g. multidrug 
resistant microbial infections comprises opioid inhibitor of adenosine triphosphate-binding cassette drug 
transporter and anti-microbial agent. SCHOENHARD, G L. A61K031/00 A61K031/4709 A61K031/485 
A61K031/496 A61K031/704 A61K031/7048 A61K031/7072 A61K031/7076 A61K038/13 
A61K038/14. 



□ 60. WO 200234773 A . Novel Streptococcus pneumoniae iron uptake ABC transporter peptide, 
useful in screening assay for identifying antimicrobial drug and in diagnostic assay for detecting 
stre ptococcal microorganism. BROWN, J S, et al. A61K039/09 C07K014/315 C07K016/12 
C12N001/21 C12N015/63. 
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□ 61 . US20020081687A . Novel sitosterolemia susceptibility gene polypeptide and polynucleotide, 
useful for screening a compound that increases the level of expression or activity of SSG polypeptide for 
treating sterol-related disorder. SCHULTZ, J, et al. C07H021/04 C07K014/00 C12N005/06 
C12N009/02 C12P021/02. 



□ 62. US 6251629 B. New ABC transporter polypeptides and polynucleotides from Staphylococcus 
aureus, useful for screening antibiotics or as research reagents for diagnosing human diseases, e.g. 
emphysema, toxic shock syndrome or wound infection. WARREN, R L. C07H021/04 C12N015/31 
C12N015/62C12N015/63. 



C 63. EP 1074623 A . New Staphylococcus aureus ABC transporter polypeptides useful as diagnostic 
reagents, or for treating bacterial infections, e.g. otitis media, lung abscess, impetigo or wound infection. 
WARREN, RL. C07K014/31 C07K016/12 C12N015/31 C12Q001/68 G01N001/00. 

□ 64. US2002009 1 092 A . Use of propionyl L-carnitine or its salts for inducement of apoptosis used 
in the treatment of e.g. hypertension, pulmonary hypertension, the prevention of restenosis after 
angioplasty or coronary stenting, and for the treatment of tumors. CALVANI, M, et al. A61K031/00 
A61K031/198 A61K031/205 A61K031/22 A61K031/223 A61K031/225 A61K031/47 A61K031/7024 
A61K045/00 A61P009/00 A61P009/12 A61P035/00 A61P043/00. 



□ 65. WO 200027386A . Use of propionyl L- carnitine and its salts for inhibiting the proliferation of 
smooth muscular cells of the vascular wall used in e.g. the treatment of artherosclerosis, hypertension, 
pulmonary hypertension and preventing restenosis is new. CALVANI, M. A61K031/22. 

□ 66. US 6448224B . New nucleic acid and peptides, useful as antibiotic peptides. NOVAK, R, et al. 
A61K038/02 A61K038/16 A61K039/09 C07K007/08 C07K014/315 C07K016/12 C12N009/12 
C12N009/14C12N015/31 C12N015/54 C12N015/55 C12Q001/68. 



□ 67. WO 995041 8A . Novel methods for the treatment and diagnosis of staphylococcal infections. 
BURNIE, J P. A61K031/7088 A61K038/00 A61K039/00 A61K039/085 A61K048/00 A61P031/04 
C07K014/31 C07K016/00 C07K016/12 C12N015/09 C12N015/31 C12P021/02 C12Q001/04 
C12Q001/68 G01N033/569 G01N033/68 C12N015/09 C12N015/09 C12N015/09 C12R001:44 
C12R001:445C12R001:45. 



I~l 68. US 6300094B . New ABC transp o rter from Staphylococcus aureus - useful to treat infections. 
TRAINI, C M, et al. C07K014/31 C07K016/12 C12N001/15 C12N001/19 C12N001/21 C12N005/10 
C12N015/09 C12N015/31 C12N015/63 C12N015/74 C12P021/02 C12Q001/68 G01N033/53 
C12N015/09 C12R001:445. 



□ 69. E P 9085 16 A . New polypeptides encoding members of the ABC transporter family from 
Staphyloc o ccus aureus useful for diagnosing and treating diseases such as osteomyelitis and toxic shock 
syndrome. WARREN, RL. C07K014/31 C07K016/12 C12N015/31 C12Q001/68 G01N001/00. 

□ 70. WO 9801 154A . Treating and diagnosing bacterial and fungal infection with ABC transporter 
protein - or neutralising or binding agents, and new Staphylococcal proteins, particularly for infections 
caused by drug resistant Staphylococci and Enterococci. BURNIE, J P, et al. A61K038/17 A61K039/00 
A61K039/02 A61K039/085 A61K039/09 A61K039/395 A61K039/40 A61K045/00 A61P031/00 
A61P031/04 C07H021/04 C07K014/31 C07K014/315 C07K014/435 C07K016/12 C12N001/18 
C12N001/21 C12N005/06 C12N015/09 C12P021/02 G01N033/569. 



□ 71. EP 569541B . Treatment and repair of defects or lesions in cartilage - by a growth factor 
http://westbrs:9000/bin/cgi-bin/accum_query.pl 6/9/04 
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containing matrix, esp. useful for treating osteoarthritis and traumatic cartilage damage. HUNZIKER, E 
B. A61F002/02 A61F002/30 A61K000/00 A61K009/00 A61K009/127 A61K009/70 A61K037/00 
A61K037/02 A61K037/22 A61K037/24 A61K037/36 A61K037/48 A61K038/18 A61K038/27 
A61K045/06 A61K047/00 A61K047/30 A61K047/42 C07K015/06 C09H003/02. 



□ 72. HU 45276T . Prepn. of oligomycin ABC complex by fermentation - with streptomyces 
diastatochromogenes var RO-31. ISTVAN, H, et al. C12P017/18. 



[J 73. US 3501568A . Antibiotic a3823 complex (factors a,b,c and d) anti- - bacterial antifungal 
anticoccidial antineoplastic. A61K021/00 C07G011/00. 
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L12: Entry 3 of 18 



File: PGPB 



Jan 2, 2003 



DOCUMENT- IDENTIFIER: US 20030004097 Al f N 

TITLE: Methods and compositions for inducing autoimmunity in the treatment of 
cancers _ 



Detail Description Paragraph : 

[0178] Various methods have been used to determine the presence of PS on cell 
membranes. These include direct chemical modification with membrane impermeable 
reagents such as trinitrobenzenesulf onic acid and hydrolysis with specific 
phospholipases (Gordesky et al . , 1975/ Etemadi, 1980), direct labeling with PS 
binding proteins (Thiagarajan and Tait, 1990/ Tait and Gibson, 1994/ Vermes et al . , 
1995/ Kuypers et al . , 1996), and PS-dependent catalysis of coagulation (Rosing et 
al., 1980/ Van Dieijen et al . , 1981). Several laboratories used lipid antibodies to 
detect cell surface PS (Maneta-Peyret et al . , 1993/ Rote et al . , 1993/ Rote et al . , 
1995/ Katsuragawa et al . , 1995). However, many of these antibodies are not specific 
and cross -reactivity is common. This may be due to the weak antigenic presentation 
of the phosphorylated head groups that are critical to specificity or to the 
generation of antibod ie s to diacylglycerol , phosphodi ester and/or fatty acid 
moieties that are common to all phospholipids. In an attempt to produce specific PS 
antibodies, the inventor immunized rabbits with PS covalently coupled to bovine 
serum albumin or KLH via its fatty acid side chain without modifying the crucial 
phosphoserine moiety. 
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L17: Entry 21 of 54 File: USPT Jan 2, 2001 



DOCUMENT -IDENTIFIER: US 6168790 Bl 

** See image for Certificate of Correction ** 

TITLE: Use of antibodies to block the effects of gram-positive bacteria and 
mycobacteria 

Detailed Description Text (163) : 

Heparinized whole mouse blood was distributed in a microtiter plate 
(200 .mu.l/well) and incubated in presence of LPS and polyclonal anti -murine CD14 
IgG after 4 hours incubation at 37. degree. C, conditioned plasma were assayed for 
TNF bioactivity using the method of Espevik and Nis sen -Meyer, supra. In experiments 
with THP-1 cells, cells were washed 2 times with serum-free RPMI containing 0.5 
mg/ml human serum albumin, resuspended in serum free media, and distributed at the 
concentration of 5 -7 . times . 10 . sup . 4 cells/well. Fetal bovine serum (Sigma) was 
added to obtain a final concentration of 5%. Various concentrations of LPS, cell 
wall preparations, LAM or soluble peptid oglycan were added to the cells with or 
without antibodies in duplicate, and incubated at 37. degree. C. for 7 hours. Cell 
free supernates were then sampled and frozen at -20. degree. C. IL-8 was measured 
with an ELISA as previously described by Standiford, et al. (J. Immunol., 145:1435- 
1439, 1990), with results as shown in FIG. 13. 
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L17: Entry 26 of 54 



File: USPT 



Aug 15, 2000 



DOCUMENT- IDENTIFIER : US 6103468 A 

TITLE: Rapid two-stage polymerase chain reaction method for detection of lactic acid 
acid bacteria in beer 

Brief Summary Text (24) : 

U.S. Pat. No. 5,139,933 discloses an assay method to quickly detect the presence of 
Listeria strains in samples, characterized by the use of a ntibodies to selectively 
captu re the peptidoglycan and teichoic acid components of the listeriae bacterial 
cell wall . 
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YSTEM:OS - DIALOG OneSearch 

File 155 : MEDLINE (R) 1966-2004/May W5 

(c) format only 2 004 The Dialog Corp. 
*File 155: Medline has been reloaded. Accession numbers 
have changed. Please see HELP NEWS 154 for details. 
File 348:EUROPEAN PATENTS 1978-2004/May W04 

(c) 2004 European Patent Office 
File 349:PCT FULLTEXT 197 9 -2 0 02 /UB=2 0 04 0527 , UT=2 0 04 052 0 

(c) 2004 WIPO/Univentio 
File 654 :US Pat. Full. 1976-2004/Jun 01 

(c) Format only 2 004 The Dialog Corp. 
*File 654: US published applications now online. See HELP NEWS 654 
for details. Reassignments current through December 2, 2003. 
File 5:Biosis Previews (R) 1969-2004/May W5 

(c) 2004 BIOSIS 
File 34 :SciSearch(R) Cited Ref Sci 1990-2004/May W5 

(c) 2004 Inst for Sci Info 
File 35 dissertation Abs Online 1861-2004/May 

(c) 2004 ProQuest Inf o&Learning 
File 73 : EMBASE 1974 -2004/May W5 

(c) 2004 Elsevier Science B.V. 
File 144:Pascal 1973 -2004/May W4 

(C) 2004 INIST/CNRS 
File 357:Derwent Biotech Res. _1982 -2004/ Jun Wl 

(c) 2004 Thomson Derwent & ISI 
File 440:Current Contents Search(R) 1990-2004/ Jun 03 

(c) 2004 Inst for Sci Info 

Set Items Description 



Cost is in DialUnits 
?t S2/9/1 10 

2/9/1 (Item 1 from file: 155) 

DIALOG (R) File 155 : MEDLINE (R) 

(c) format only 2004 The Dialog Corp. All rts . reserv. 

09860124 PMID: 8212 84 6 

Assessment of non-protein impurities in potential vaccine proteins 
produced by Bacillus subtilis. 

Himanen J P; Sarvas M; Helander I M 

Department of Molecular Bacteriology, National Public Health Institute, 
Helsinki , Finland . 

Vaccine (ENGLAND) 1993, 11 (9) p970-3, ISSN 0264-410X 
Journal Code: 8406899 

Document type: Journal Article 

Language s : ENGL I SH 

Main Citation Owner: NLM 

Record type: Completed 

Subfile: INDEX MEDICUS 

The levels of non-protein impurities at different stages of purification 
of model vaccine proteins produced by Bacillus subtilis were assessed with 
special emphasis on peptidoglycan-wall teichoic acid and lipoteichoic 
acid. Intracytoplasmically produced proteins were purified by disrupting 
the lysozyme protoplasts using osmotic shock, depositing the inclusion 
bodies by low-speed centrif ugation, and washing them with detergent. By 
this procedure most of the cell envelope-derived impurities could be 
removed. The final product contained less than 1% (w/w) of neutral sugars, 
fatty acids, phosphate, hexosamine , diaminopimelic acid and glycerol. A 
secreted protein was purified from the culture supernatant by successive 
ion-exchange and adsorption chromatography. The cell envelope-derived 
impurities were efficiently removed by the cation-exchanger, and the final 
product contained only minute amounts of non-protein components. The 
amounts of non-protein components such as peptidoglycan and lipoteichoic 
acid in proteins produced in either mode were shown to be negligible in 
relation to their potentially harmful biological effects. 

Tags: Comparative Study; Human; Support, Non-U. S. Gov't 

Descriptors: *Bacillus subtilis- -metabolism- -ME ; *Bacterial Outer 
Membrane Proteins - -isolation and purification- - IP; *Lipopolysaccharides 



--analysis --AN; *Pertussis Toxin; *Recombinant Fusion Proteins- -isolation 
and purification- -IP; *Teichoic Acids- -analysis- -AN; *Vaccines, Synthetic 
--analysis --AN; *Virulence Factors, Bordetella- -isolation and purification 
- -IP; Carbohydrates - -analysis- -AN; Cell Fractionation- -methods - -MT ; 
Chromatography- -methods --MT; Detergents; Drug Contamination; Fatty Acids 
--analysis--AN; Phosphates- -analysis- -AN; Recombinant Fusion Proteins 
-- immunology- -IM; Vaccines, Synthetic- - isolation and purification- -IP 

CAS Registry No.: 0 (Bacterial Outer Membrane Proteins); 0 

(Carbohydrates); 0 (Detergents); 0 (Fatty Acids); 0 

(Lipopolysaccharides) ; 0 (Phosphates) ; 0 (Recombinant Fusion Proteins) 
0 (Teichoic Acids) ; 0 (Vaccines, Synthetic) ; 0 (Virulence Factors, 
Bordetella) ; 0 (pertussis toxin, SI subunit) ; 0 (pertussis toxin, S4 
subunit) ; 56411-57-5 (lipoteichoic acid) 
Enzyme No.: EC 2.4.2.31 (Pertussis Toxin) 
Record Date Created: 19931026 
Record Date Completed: 19931026 



2/9/10 (Item 1 from file: 35) 

DIALOG (R) File 35 dissertation Abs Online 
(c) 2004 ProQuest Inf o&Learning . All rts . reserv. 

01321483 ORDER NO: NOT AVAILABLE FROM UNIVERSITY MICROFILMS INT * L . 
HETEROLOGOUS VACCINE PROTEINS PRODUCED IN BACILLUS SUBTILIS: CHEMICAL AND 
BIOLOGICAL ASSESSMENT OF HOST -DERIVED IMPURITIES (VACCINE) 

Author: HIMANEN, JUHA-PEKKA 
Degree : PH.D . 
Year: 1992 

Corporate Source/institution: TURUN YLIOPISTO (FINLAND) (5760) 
Source: VOLUME 54/04-C OF DISSERTATION ABSTRACTS INTERNATIONAL. 

PAGE 1214 . 59 PAGES 
Descriptors : CHEMISTRY, BIOCHEMISTRY 
Descriptor Codes: 0487 
ISBN: 951-47-6548-6 

Publisher: NATIONAL PUBLIC HEALTH INSTITUTE, MANNERHEIMINTIE 166, 

SF-003 00 HELSINKI, FINLAND 

In this study, a new strategy for producing safe and efficient 
vaccines was applied: the genes coding for proteins are cloned from the 
pathogen to Bacillus subtilis, which is non-toxic for humans and 
genetically well characterized. Three proteins were chosen for this study: 
soluble subunits SI (BacSl) and S4 (BacS4) of pertussis toxin, and an outer 
membrane protein PI of Neisseria meningitidis (BacPl) . The proteins were 
produced either as secreted proteins or intracytoplasmically to cover 
different production systems. 

Since there is only limited experience in using Bacillus subtilis for 
producing pharmaceuticals, no generally accepted methods for checking the 
bacillar non-protein impurities in the products are available. Therefore, 
the fractionation of the main non-protein cell envelop components of 
Bacillus subtilis, teichoic acids (TA's) and peptidoglycan (PG) , during the 
purification of BacSl, BacS4 , and BacPl was investigated. The final 
preparations of BacS4 and BacPl contained less than 1% (w/w) of all the 
tested non-protein components, i.e. neutral sugars, fatty acids, 
hexosamines , organic phosphate, DAP, and glycerol. 

To estimate the biological significance of lipoteichoic acid (LTA) 
and peptidoglycanteichoic acid complex (PG-TA) in vaccine proteins, certain 
biological activities of purified and chemically characterized bacillar LTA 
and PG-TA were determined. LTA and PG-TA were found to be non- toxic for 
mice and guinea pigs in a short-term toxicity assay. PG-TA was weakly 
pyrogenic and mitogenic. Both LTA and PG-TA acted as immunologic adjuvants 
in mice. Both LTA and PG-TA, when injected in mice, also caused an increase 
in the number of granulocyte -monocyte colony- forming cells in the bone 
marrow probably via stimulation of production of granulocyte -monocyte 
colony- forming cells. With respect to vaccine production, the minute 
amounts of LTA and PG-TA in the purified heterologous proteins produced in 
Bacillus subtilis are not expected to cause harmful effects; instead, they 
may have beneficial effects on host defence. (Abstract shortened by UMI . ) 
?t s2/3,kwic/2 3 6 8 9 



2/3,KWIC/2 (Item 1 from file: 348) 

DIALOG (R) File 34 8: EUROPEAN PATENTS 
(c) 2004 European Patent Office. All rts. reserv. 



00883189 

IMMUNOMODULATORY COMPLEX AND USE THEREOF IN HELICOBACTER DISEASES 
IMMUNMODULATORI SCHER KOMPLEX UND DESSEN VERWENDUNG IN HELICOBACTER 
ERKRANKUNGEN 

COMPLEXE IMMUNOMODULATEUR ET SON UTILISATION DANS LES AFFECTIONS PAR 
HELICOBACTER 

PATENT ASSIGNEE: 

TOROSSIAN, Fernand Narbey, (1858080), 10, rue Noel -Ballay, , F-31400 
Toulouse, (FR) , (Proprietor designated states: all) 
INVENTOR : 

TOROSSIAN, Fernand Narbey, 10, rue Noel-Ballay, , F-31400 Toulouse, (FR) 
LEGAL REPRESENTATIVE: 

Morelle, Guy Georges Alain (50595) , Cabinet Morelle & Bardou, 9, Avenue 



de 1' Europe BP 53, 31527 Ramonville Cedex, (FR) 



PATENT (CC, No, Kind, Date) 



(Basic) 



EP 969851 Al 000112 
EP 969851 Bl 040526 
WO 1997030716 970828 
EP 97906249 970225; WO 97FR334 



970225 



FI; FR; GB ; GR; IE; IT; LI; LU; 



APPLICATION (CC, No, Date) 

PRIORITY (CC, No, Date) : FR 962445 960226 
DESIGNATED STATES: AT; BE; CH; DE; DK; ES ; 
MC; NL; PT; SE 

INTERNATIONAL PATENT CLASS: A61K-035/74; A61K- 03 9/ 10 6 ; A61K- 03 9/ 106 ; 

A61K-31:57; A61K-38:39; A61K-39:108; A61K-035/74; A61K-38:39 
NOTE : 

No A-document published by EPO 
LANGUAGE (Publication, Procedural , Application) : French; French; French 
FULLTEXT AVAILABILITY: 
Available Text 
CLAIMS B 
CLAIMS B 
CLAIMS B 
SPEC B 
Total word count 
Total word count 
Total word count 



Language 


Update 


Word 


(English) 


200422 


354 


(German) 


200422 


348 


(French) 


200422 


338 


(French) 


200422 


3998 


. - document 


A 


0 


. - document 


B 


5038 


; - document 


S A + B 


5038 



..SPECIFICATION teichoic acids (Bacteriol. Reviews, 37, 21, 215-57). 

G.A. MILLER (1976) - Effects of streptococcal lipoteichoic acid on 
host response in mice (Infect, and Immun. , 1976, 13, (5), 1408-17). 



A.J. WICKEN et coll. (1975) - 
bacterial antigens (Science, 187, 

A. . . 
. .Hexoses 

T.A. SCOTT - Dosage colorimetr 
1956-61) . 

Hexos amines 
L.A. ELSON (Biochem. J (1953), 27, 1824-28) 
Lipopoly saccharides 
J. JANDA et E. WORK... 



Lipoteichoic acids: a new class of 
1161-67) . Differents dosages possibles 



a l'anthrone (Anal. Chem. (1953), 25, 



2/3,KWIC/3 (Item 2 from file: 348) 

DIALOG(R) File 348:EUROPEAN PATENTS 
(c) 2004 European Patent Office. All rts. reserv. 

00796491 

ANTITUMOR PREPARATIONS CONTAINING A LIPOTEICHOIC ACID FROM STEPTOCOCCUS 
ANTITUMORPRAPARATE, DIE EINE LIPOTEICHONSAURE AUS STREPTOCOCCUS ENTHALTEN 
PREPARATIONS ANTITUMORALES CONTENANT UN AC IDE LIPOTEICHOIQUE TIRE DE 
STREPTOCOCCUS 

PATENT ASSIGNEE: 

Lunamed AG, (3874890), Kirschbaumweg 38, 4103 Bottmingen, (CH) , 
(Proprietor designated states: all) 
INVENTOR : 

TRUOG, Peter, St. Johanns -Vorstadt 38, CH-4056 Basel, (CH) 



ROTHLISBERGER, Peter, Breitensteinstrasse 100, CH-8037 Zurich, (CH) 
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be isolated. 

Background of the Invention 

Lipoteichoic acids (LTAs) are a group of amphipathic substances found 
in the cell wall of grain... 

. . .and a hydrophobic glycol ipid moiety. The hydrophilic backbone may be 
substituted with alanine, hexoses and hexosamines . The glycolipids 
described so far were mainly dihexosylglycerols and some 
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variation in the degree of polymerization of the hydrophilic chain. . . 
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Detailed Description 

new Streptococcus strain from which the new compound can be 
isolated 

Background of the Invention 

Lipoteichoic acids (LTAs) are a group of amphipathic sub 
stances found in the cell ...and a hydrophobic 
glycolipid moiety. The hydrophilic backbone may be substituted 
with alanine, hexoses and hexosamines . The glycolipids described so far 
were mainly dihexosylglycerols and some tri 
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...and a hydrophobic glycolipid moiety. The hydrophilic backbone may be 
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described so far were mainly dihexosylglycerols and some 
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ABSTRACTED -PUB -NO: EP 8 0 7185B 
BASIC -ABSTRACT: 

The following are new: 

(1) a purified lipoteichoic acid (LTA) , isolated from Streptococcus sp PT strain 
DSM 8747, designated ' LTA-T ' and specifically of formula (I), and its salts; 

(2) Streptococcus sp PT strain DSM 8747; 
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(3) a deacylated LTA cpd., designated 1 LTA-T ' , of formula (I) in which both Rl and 
R2 are replaced by H, and its salt, and 

(4) beta -galacto-furanosyl ( 1-3 ) glycerol -di - (R2) ester of formula (II) (opt. as a 
single cpd.) and its salt. Rl = H or D-alanyl with a ratio to P of 0.27-0.35; R2 = 
residues of satd. or unsatd. fatty acids with 12C, 14C, 16C or 18C; n = 9 (mean 
value) . 

USE - LTA-T has strong antitumour activity and is used for treating cancer 
(claimed), opt. in combination with a monokine and/or hyaluronidase . LTA-T is also 
used for lowering blood cholesterol levels (claimed) . LTA-T and (II) are degradation 
degradation prods, of LTA-T and are useful as: (i) analytical tools for the 
identification and characterisation of LTA-T and (ii) starting materials for the 
prepn. of new LTA's with specific R2 gps . (e.g. by esterif ication of LTA-T with 
specific fatty acids) or with the 6-OH of the galacto-furanosyl moiety esterified by 
by a defined hydrophilic gp . LTA-T is administered orally or parenterally , esp. 
s.c, i.v. or i.p. at a concn. of 0.1-20 mu mol/ml. A typical antitumour dose is 
0.001-20 (esp. 0.01-2) mg/kg, opt. in combination of 500-5000 (esp. ca 1000) U of 
hyaluronidase and/or 0.1-2 0 x 106 U of monokine. 
ABSTRACTED - PUB -NO : 

US 6114161A 
EQUIVALENT -ABSTRACTS : 

The following are new: 

(1) a purified lipoteichoic acid (LTA) , isolated from Streptococcus sp PT strain DSM 
DSM 8747, designated 'LTA-T* and specifically of formula (I), and its salts; 

(2) Streptococcus sp PT strain DSM 8747; 

(3) a deacylated LTA cpd., designated 'LTA-T', of formula (I) in which both Rl and 
R2 are replaced by H, and its salt, and 

(4) beta -galacto-furanosyl (1-3) glycerol-di- (R2) ester of formula (II) (opt. as a 
single cpd.) and its salt. Rl = H or D-alanyl with a ratio to P of 0.27-0.35; R2 = 
residues of satd. or unsatd. fatty acids with 12C, 14C, 16C or 18C; n = 9 (mean 
value) . 

USE - LTA-T has strong antitumour activity and is used for treating cancer 
(claimed), opt. in combination with a monokine and/or hyaluronidase. LTA-T is also 
used for lowering blood cholesterol levels (claimed) . LTA-T and (II) are degradation 
degradation prods, of LTA-T and are useful as: (i) analytical tools for the 
identification and characterisation of LTA-T and (ii) starting materials for the 
prepn. of new LTA's with specific R2 gps. (e.g. by esterif ication of LTA-T with 
specific fatty acids) or with the 6-OH of the galacto-furanosyl moiety esterified by 
by a defined hydrophilic gp. LTA-T is administered orally or parenterally, esp. 
s.c., i.v. or i.p. at a concn. of 0.1-20 mu mol/ml. A typical antitumour dose is 
0.001-20 (esp. 0.01-2) mg/kg, opt. in combination of 500-5000 (esp. ca 1000) U of 
hyaluronidase and/or 0.1-20 x 106 U of monokine. 

The following are new: 

(1) a purified lipoteichoic acid (LTA) , isolated from Streptococcus sp PT strain DSM 
DSM 8747, designated 'LTA-T' and specifically of formula (I), and its salts; 

(2) Streptococcus sp PT strain DSM 8747; 

(3) a deacylated LTA cpd., designated 'LTA-T' , of formula (I) in which both Rl and 
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R2 are replaced by H, and its salt, and 

(4) beta -galacto-f uranosyl (1-3) glycerol-di- (R2) ester of formula (II) (opt. as a 
single cpd.) and its salt. Rl = H or D-alanyl with a ratio to P of 0.27-0.35; R2 = 
residues of satd. or unsatd. fatty acids with 12C, 14C, 16C or 18C; n = 9 (mean 
value) . 

USE - LTA-T has strong antitumour activity and is used for treating cancer 
(claimed), opt. in combination with a monokine and/or hyaluronidase. LTA-T is also 
used for lowering blood cholesterol levels (claimed) . LTA-T and (II) are degradation 
degradation prods, of LTA-T and are useful as: (i) analytical tools for the 
identification and characterisation of LTA-T and (ii) starting materials for the 
prepn. of new LTA's with specific R2 gps. (e.g. by esterif ication of LTA-T with 
specific fatty acids) or with the 6-0H of the galacto-f uranosyl moiety esterified by 
by a defined hydrophilic gp . LTA-T is administered orally or parenterally , esp. 
s.c, i.v. or i.p. at a concn. of 0.1-20 mu mol/ml . A typical antitumour dose is 
0.001-20 (esp. 0.01-2) mg/kg, opt. in combination of 500-5000 (esp. ca 1000) U of 
hyaluronidase and/or 0.1-20 x 106 U of monokine. 

US 6214978B 

The following are new: 

(1) a purified lipoteichoic acid (LTA) , isolated from Streptococcus sp PT strain DSM 
DSM 8747, designated 'LTA-T' and specifically of formula (I), and its salts; 

(2) Streptococcus sp PT strain DSM 8747; 

(3) a deacylated LTA cpd., designated 'LTA-T', of formula (I) in which both Rl and 
R2 are replaced by H, and its salt, and 

(4) beta -galacto-furanosyl (1-3) glycerol-di- (R2) ester of formula (II) (opt. as a 
single cpd.) and its salt. Rl = H or D-alanyl with a ratio to P of 0.27-0.35; R2 = 
residues of satd. or unsatd. fatty acids with 12C, 14C, 16C or 18C; n = 9 (mean 
value) . 

USE - LTA-T has strong antitumour activity and is used for treating cancer 
(claimed), opt. in combination with a monokine and/or hyaluronidase. LTA-T is also 
used for lowering blood cholesterol levels (claimed) . LTA-T and (II) are degradation 
degradation prods, of LTA-T and are useful as: (i) analytical tools for the 
identification and characterisation of LTA-T and (ii) starting materials for the 
prepn. of new LTA's with specific R2 gps. (e.g. by esterif ication of LTA-T with 
specific fatty acids) or with the 6-OH of the galacto-f uranosyl moiety esterified by 
by a defined hydrophilic gp . LTA-T is administered orally or parenterally, esp. 
s.c, i.v. or i.p. at a concn. of 0.1-2 0 mu mol/ml. A typical antitumour dose is 
0.001-20 (esp. 0.01-2) mg/kg, opt. in combination of 500-5000 (esp. ca 1000) U of 
hyaluronidase and/or 0.1-20 x 106 U of monokine. 

WO 9623896A 
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TAG 3'. The 1.1 kb PCR product was gel purified with 
GeneClean II (Bio 101), restricted with Ncol and BamHI and 
cloned into NcoI-BamHI cut and phosphatased Cadus 1122 
to yield Cadus 1605. The sequence of Cadus 1605 was 
verified by restriction analysis and dideoxy-sequencing of 
double-stranded templates. Recombinant GPA Bam - 
Ga_hybrids of Gas, Gai2, and Gal 6 were generated. 
Construction of Cadus 1855 encoding recombinant GPA- 
Bam -Ga_16 serves as a master example: construction of the 
other hybrids followed an analogous cloning strategy. The 
parental plasmid Cadus 1617, encoding native Gal 6, was 
restricted with Ncol and BamHI, treated with shrimp alka- 
line phosphatase as per the manufacturer's specifications 
and the linearized vector was purified by gel electrophoresis. 
Cadus 1605 was restricted with Ncol and BamHI and the 1,1 
kb fragment encoding the amino terminal 60% of GPA1 with 
a novel BamHI site at the 3 1 end was cloned into the Ncol- 
and BamI II -restricted Cadus 1617. The resulting plasmid 
encoding the GPA Bam -Ga_16 hybrid was verified by 
restriction analysis and assayed in tester strains for an ability 
to couple to yeast Gpy an ^ thereby suppress the gpal null 
phenotype. Two additional GPA Bam -Ga_hybrids, GPA Bam - 
Gas_ and GPA Bam -Gai2, described in this application were 
prepared in an analogous manner using Cadusl606 as the 
parental plasmid for the construction of the GPA Bam -Ga_i2 
hybrid and Cadus 1181 as the parental plasmid for the 
construction of the GPA Bam -Ga_s hybrid. 

[0501] Coupling by chimeric Ga_proteins._The Ga chi- 
meras described above were tested for the ability to couple 
a mammalian G protein -coupled receptor to the pheromone 
response pathway in yeast. The results of these experiments 
are outlined in Table 5. Results obtained using GPA1 41 - 
Gai2 to couple the human C5a receptor to the pheromone 
response pathway in autocrine strains of yeast are disclosed 
in Example 10 above. 



TABLE 1 

ABC TRANSPORTERS * 



Species 


System 


Substrate 


Bacteria 






Salmonella 


OppABCDF 


Oligopeptides 


typhimurium 






Streptococcus 


AmiABCDRF 


Oligopeptides 


pneumoniae 






Bacillus 


Opp (SpoOK) 


Oligopeptides 


subtilis 






E. coli 


Dpp 


Dipeptides 


Bacilus subtilis 


DciA 


Dipcptides 


S. typhimurium 


HisJQMP 


Histidine 


E. coli 


HisJQMP 


Histidine 


E. coli 


MalEFGK 


Maltose 


S. typhimurium 


MalEFGK 


Maltose 


Enterobacter 


MalEFGK 


Maltose 


aerogenes 






E. coli 


UgpABCE 


sn-Glycerol-3- 






phosphate 


E. coli 


AraFGH 


Arabinose 


E. coli 


RbsACD 


Ribose 


E. coli 


GlnHPQ 


Glutamine 


S. typhimurium 


ProU (VWX) 


Glyciiie-betaine 


E. coli 


ProU (VWX) 


Glycine-betaine 


E. coli 


LivHMGF (JK) Leucine- 






isoleu cine- valine 
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ABC TRANSPORTERS* 



Species 


System 


Substrate 


E. coli 


PstABC 


Phosphate 


Pseudomonas 


NosDYF 


Copper 


stutzeri 






E. coli 


ChUD 


Molybdenum 


E. coli 


CysPTWAM 


Sulphate- 






Thiosulfate 


E. coli 


BtuCDE 


Vitamin B12 


E. coli 


FhuBCD 


Fe 3+ -ferrichrome 


E. cvli 


FecBCDE 


Fe 34 "-dicitiate 


S. marcescens 


SfuABC 


Fe 3 * 


Mycoplasma 


p37, 29, 69 


? 


E. coli 


Phn/Psi 


Alky 1-pho spho nates 






(?) 


Streptomyces 


DrrAB 


Daunomycin/Doxorubicin 


peucetius 






Streptomyces 


TlrC 


Tylosin 


fradiae 






Staphylococcus 


MsrA 


Erythromycin 






resistance 


Agrobacterium 


OccJQMP 


Oclopine 


tumefaciens 






E. coli 


FTJyB 


Ilaemolysin 


Pasturel la 


LtkB 


leuko toxin 


E. coli 


CvaB 


Colicin V 


Erwinia 


PrtD 


Proteases 


chrysanthemi 






Bordetella 


CyaB 


Cyclolysin 


pertussis 






Streptococcus 


ComA 


Competence factor 


pneumoniae 






Rhizobium meliloti 


NdvA 


fj-l,2-glucan 


Agrobacterium 


ChvA 


p-l,2-glucan 


tumefaciens 






Haemophilus 


BexAB 


Capsule 


influenzae. 




polysaccharide 


E. coli 


KpsMT 


Capsule 






polysaccharide 


Niesseria 


CrtCD 


Capsule 






polysaccharide 


E. coli 


FtsE 


Cell division 


E. coli 


UvrA 


DNA repair 


Rhizobium 


Nodi 


Nodulation 


leguminosarum 






Rhizobium meliloti 


OFR1 


? 


Cyanobacteria 






Anabaena 


HetA 


Differentiation 


Synchococcus 


CysA 


Sulphate 


Yeast 






S. cerevisiae 


STE6 


a-mating peptide 


S. cerevisiae 


ADP1 


? 


S. cerevisiae 


EF-3 


Translation 


Protozoa 






Plasmodium 


pfMDR 


Chloroquinc 


Lieshmania 


ltpgpA 


Metho trexa te/he avy 






metals 


Insect 






Drosophila 


white-brown 


Eye pigments 


Drosophila 


Mdr49 


Hydrophobic drugs? 




Mdr65 


? 


Plants 






Liverwort 


MbpX 


? 



chloroplast 
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PROBLEM TO BE SOLVED: To obtain a new isolated polypeptide comprising an ABC 
transporter containing an amino acid sequence having high identity of a specific 
amino acid sequence along the whole length, useful for searching an antibacterial 
compound, treating and diagnosing an infection with a bacterium of the genus 
Streptococcus , etc . 

SOLUTION: This new ABC transporter comprises an isolated polypeptide containing an 
amino acid sequence having at least 70% identity of an amino acid sequence of the 
formula along the whole length of the formula and is useful for screening an 
antibacterial compound, treating and diagnosing an infection with Streptococcus 
aureus, etc. The isolated polypeptide is obtained by probing the library of a 
chromosome DNA clone of Strepto coccus aureus WCUH2 9 with a partial sequence -derived 
radioactive labeled oligonucleotide, preferably an oligonucleotide of heptadecamer 
or longer than it, incorporating the obtained ABC transporter gene to an expression 
system and culturing a host cell. 
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MOLECULAR CHARACTERIZATION OF STREPTOCOCCUS UBERIS CAMP FACTOR, LACTOFERRIN 
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Year: 1996 
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The gene coding for the CAMP factor from a strain of Streptococcus 
uberis (S. uberis) was cloned in E. coli . Chromosomal DNA from S. uberis 
was used to construct a gene library in plasmid pTZ18R and six 
CAMP-reaction positive clones were obtained from a total of 10,000 
transf ormants . One clone, pJLD21, was subcloned and the CAMP factor gene 
(cfu) was localized to a 3.2 kb BamHI fragment. The nucleotide sequence of 
cfu was determined and the deduced amino acid sequence shown to be 
homologous to the corresponding Streptococcus agalactiae (S. agalactiae) 
protein. Immunoblot analysis revealed that the recombinant strain 
containing pJLD21 expressed a protein with a molecular weight of 28,000. 
Antibodies raised against purified S. uberis CAMP factor cross-reacted 
with S. agalactiae protein B. Southern blot analysis demonstrated that the 
six CAMP-reaction positive E. coli clones contained the same CAMP factor 
gene, and this gene existed in three out of eight S. uberis strains. 

An ORF encoding a 277 -residue protein was identified upstream of the 
CAMP factor gene. Sequence analysis indicated that the gene product is 
potentially a polar amino acid and opine binding protein of an ABC -type 
transport system. 

The interaction between S. uberis and bovine lactoferrin (bLf) has 
been characterized. Apo-bLf could inhibit $\sp{ 125 } $I-bLf binding as 
effectively as iron-saturated bLf. Bovine transferrin, human lactoferrin 
and human transferrin did not interfere with bLf binding. The Scatchard 
plot was linear and approximately 7 8 00 binding sites were expressed by each 
bacterial cell, with an affinity of $1.0\times 10\sp{-7}$ M. Heat- or 
protease treatment of bacterial cells reduced bLf binding to a great 
degree. Two components with estimated molecular weights of 165,000 and 
76,000 were originally identified from the cell wall as the functionally 
active bLf binding proteins. 

The gene coding for the bLf binding protein (Ibp) of S. uberis has 
been cloned and sequenced. A single ORF encoding 561 amino acid residues 
resulted in the presence of two proteins in the recombinant E. coli cell. 
These proteins were able to bind bovine lactoferrin and had molecular 
weights of 76,000 and 165,000, similar to those detected in S. uberis. A 
putative signal peptide was found at the N terminus of the deduced amino 
acid sequence and the C terminus had the features of the membrane anchor 
motif found in other surface proteins from Gram positive bacteria. Deletion 
analysis located the bLf binding domain to a 2 00 amino acid region at the N 
terminus of this protein. 

The vaccine potential of recombinant CAMP factor and lactoferrin 
binding protein has been evaluated. (Abstract shortened by UMI . ) 



Higgins CF. 1992. ABC transporters : from microorganisms 
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ABSTRACT: Bacterial polysaccharides are usually associated with the outer 
surface of the bacterium. They can form an amorphous layer of extracellular 
polysaccharide (EPS) surrounding the cell that may be further organized 
into a distinct structure termed a capsule. Additional polysaccharide 
molecules such as lipopolysaccharide (LPS) or lipooligosaccharide (LOS) may 
also decorate the cell surface. Polysaccharide capsules may mediate a 
number of biological processes, including invasive infections of human 
beings. Discussed here are the genetics and biochemistry of selected 
bacterial capsular polysaccharides and the basis of capsule diversity but 
not the genetics and biochemistry of LPS biosynthesis (for reviews see 100, 
140) . Reprinted by permission of the publisher. 

TEXT: 

INTRODUCTION 

Polysaccharide capsules are ubiquitous structures found on the cell surface 
of a broad range of bacterial species. The polysaccharide capsule often 
constitutes the outermost layer of the cell; as such, it may mediate direct 
interactions between the bacterium and its immediate environment and has 
been implicated as an important factor in the virulence of many animal and 
plant pathogens (23, 82, 103). Capsular polysaccharides are linked to the 
cell surface of the bacterium via covalent attachments to either 
phospholipid or lipid-A molecules (140) . In contrast, extracellular 
polysaccharide (EPS) molecules appear to be released onto the cell surface 
with no visible means of attachment and are often sloughed off to form 
slime. The release of polysaccharide from the cell surface must be used 
with caution as a criterion for differentiating between capsules and EPS. 
Capsular polysaccharides may themselves be released into the growth medium 
as a consequence of the stability of the phosphodiester linkage between the 
polysaccharide and the phospholipid membrane anchor. In addition, certain 
EPS molecules appear to remain tightly associated with the cell surface in 
the absence of detectable membrane anchoring (12 7) . 

Capsular polysaccharides are highly hydrated molecules that are over 
95 [percent] water (22) . They are composed of repeating single units 
(monosaccharides) joined by glycosidic linkages. They can be homo- or 
heteropolymers and may be substituted by both organic and inorganic 
molecules. Any two monosaccharides may be joined in a number of 
configurations as a consequence of the multiple hydroxyl groups within each 
monosaccharide unit that may be involved in the formation of a glycosidic 
bond. As a result of this, capsular polysaccharides are an incredibly 
diverse range of molecules that may differ not only by monosaccharide units 
but also in how these units are joined together. The introduction of 
branches into the polysaccharide chain and substitution of both organic and 
inorganic molecules yield additional structural complexity. 

In the case of human pathogens, a large number of different capsule 
serotypes have been identified. Over 80 different capsular polysaccharides 
or K antigens have been described for Escherichia coli, of which a small 
fraction are associated with invasive infections (87) . The expression of 
particular K antigens can be associated with specific infections. For 
instance, E. coli that expresses the Kl antigen, a homopolymer of 
a2,8-linked N-acetylneuraminic acid (NeuNAc) , is the major cause of 
neonatal meningitis (101) . Certain E. coli K antigens have identical 
polysaccharide chains and differ only in modification of the 



polysaccharide. The K13 , K20, and K23 antigens are all polymers of ribose 
(rib) and 2 -keto-3 -deoxymanno-oc tonic acid (KDO) that have the structure = 
3) -b-D-Rib- (1-7) -b-KDO- (2 = (131, 129)). However, in the case of the K13 
and K20 antigens, the molecule is O-acetylated at the KDO and rib units 
respectively (129, 131) . Chemically identical capsular polysaccharides may 
also be synthesized by different bacterial species. The Neisseria 
meningitidis group B capsular polysaccharide is identical to the Kl polymer 
of E. coli (48), and the E. coli K18, K22, and K100 antigens have the same 
constituents and a structure similar to serotype b capsule of Haemophilus 
influenzae (59) . The apparent conservation of particular capsular 
polysaccharide structures between taxonomically diverse bacterial general 
raises interesting questions concerning the evolution of capsule diversity 
and the acquisition and transmission of capsule biosynthesis genes. 

FUNCTIONS OF BACTERIAL CAPSULES 
A number of possible functions have been suggested for polysaccharide 
capsules (Table 1) . 



PREVENTION OF DESICCATION 
The formation by capsules of a hydrated gel around the surface of the 
bacteria may protect the bacteria from the harmful effects of desiccation 
(102) . This may be particularly relevant in aiding the transmission of 
encapsulated pathogens from one host to the next. Mucoid isolates of E. 
coli, Acinetobacter calcoaceticus , and Erwinia Stewart ii are more resistant 
to drying than are isogenic nonmucoid strains (85) . In the case of E. 
coli, desiccation increases expression of the genes encoding enzymes for 
the biosynthesis of colanic add (cell-surface slime) (85) . The mechanism 
by which bacteria might regulate capsule expression in response to 
desiccation is unclear. One possible mechanism is that desiccation changes 
the external osmolarity, which triggers increased capsule biosynthesis. 
Indeed, both alginate production by Pseudomonas aeruginosa (9) and 
expression of the Vi antigen in Salmonella typhi increase in response to 
high external osmolarity (98) . 



ADHERENCE 

Capsular polysaccharides may promote the adherence of bacteria to both 
surfaces and to each other, and they thereby facilitate the formation of a 
biofilm and the colonization of various ecological niches (21) . During 
colonization of oral surfaces and the temporal development of bacterial 
plaque, specific colonizing bacteria may provide bridges for the subsequent 
attachment of other bacterial species (66) . This intergeneric interaction 
that establishes microbial consortia within the biofilm is mediated in part 
through lectin-ligand interactions that involve cell-surface 
polysaccharide molecules (64) . The formation of biofilms may offer the 
individual bacteria protection from phagocytic protozoa and infection by 
bacteriophages as well as nutritional advantages. 

The ability of bacteria to attach to surfaces and establish a biofilm 
can have far-reaching consequences. The colonization of indwelling 
catheters in hospitalized patients can lead to serious nosocomial 
infections. The overexpression of alginate by Pseudomonas species within 
the lungs of cystic fibrosis patients forms an alginate-rich biofilm that 
may present a permeability barrier to antibiotics (47) . The fouling of 
pipes in industrial processes due to biofilm formation can lead to 
substantial economic losses and delays in production while biofilms are 
removed (21) . In contrast to these adherent properties, capsular 
polysaccharides may have lubricant properties that facilitate the swarming 
of Proteus mirabilis over solid substrata by reducing friction (50) . 

RESISTANCE TO NONSPECIFIC HOST IMMUNITY 
During invasive bacterial infections, interactions between the capsular 
polysaccharide and the host's immune system can decide the outcome of the 
infection (105) . In the absense of specific antibody , the presence of a 
capsule is thought to confer resistance to nonspecific host defense 
mechanisms. These responses include activation of the complement cascade 
via the alternative pathway and of the C3b-mediated opsonophagocytosis by 
polymorphonuclear leukocytes. Both of these responses provide protection 



in the preimmune host when specific antibodies are absent. The 
alternative pathway is initiated by the nonspecific binding of the serum 
protein C3b to the bacterial cell surface. The bound C3b is then activated 
by interaction with factor B and forms the C3 convertase C3bBb. This leads 
to the binding of more C3 and the formation of the membrane attack complex 
on the outer membrane of the bacteria, which leads to lysis and death (38) . 
The capsule may resist complement -mediated killing by providing a 
permeability barrier to complement components, thereby masking underlying 
cell surface structures that would otherwise be potent activators of the 
alternative pathway (56) . 

Capsular polysaccharides containing NeuNAc are poor activators of the 
alternative pathway (32, 80, 122). Poor activation may be achieved by 
NeuNAc containing polysaccharides directly binding factor H (80) . The 
bound factor H acts as a cofactor to promote the binding of factor I with 
C3b to form iC3b, which breaks the amplification loop of the complement 
cascade and thereby prevents the formation of the membrane attack complex 

(38) . The capsule usually acts in concert with other cell-surface 
structures such as O antigens to confer resistance to complement -mediated 
killing (65) . As a result, a particular combination of cell-surface 
structures is responsible for conferring a high degree of resistance to 
complement -mediated killing (23, 65) . 

Capsular polysaccharides may confer resistance to complement-mediated 
opsonophagocytosis . Steric effects, in which the capsule masks the 
underlying C3b deposited on cell surface structures from C3b receptors on 
the phagocyte cell surface, may be responsible. The net negative charge 
conveyed on the cell surface by the polysaccharide capsule may also serve 
to confer resistance (16, 55, 82) . The more highly charged the capsular 
polysaccharide is, the greater the degree of resistance to phagocytosis 

(82) . In addition to these direct interactions between the bacterial 
capsule and components of the host's nonspecific immune response, certain 
capsular polysaccharides may modulate the ability of the host to mediate 
an immune response by effecting the release of cytokine molecules, thereby 
disrupting the coordination of the host's cell -mediated immune response 

(23) . 

RESISTANCE TO SPECIFIC HOST IMMUNITY 
Although most capsular polysaccharides can elicit an immune response, a 
small set of capsular polysaccharides are poorly immunogenic. These 
include polysaccharides containing NeuNAc, such as E. coli Kl or N. 
meningitidis serogroup B (10), and the E. coli K5 antigen which is similar 
to desulf oheparin (130) . As a consequence of structural similarities 
between these capsular polysaccharides and polysaccharides encountered on 
host tissue (37, 76) , these capsules are poorly immunogenic, and infected 
individuals mount a poor antibody response to such capsules (105, 142) . 
Therefore, the expression of these capsules confers some measure of 
resistance to the host's specific immune response. 



BACTERIA- PLANT INTERACTIONS 
Capsular polysaccharides play important roles in the mediation of 
microbeplant interactions. Many phytopathogenic bacteria elaborate 
capsular polysaccharides, the expression of which is essential for 
virulence. Unlike their role in human and animal pathogens, the role of 
capsular polysaccharides in the plant disease process has not been fully 
elucidated. In the case of certain vascular pathogens such as Pseudomonas 
solanacearum, the capsular polysaccharide is not required for invasion and 
growth of the bacteria in planta but is necessary for plant death, which is 
probably caused by occlusion of the xylem vessels (24) . With other 
phytopathogens such as Erwinia amylovora, the expression of a 
polysaccharide capsule is essential for growth in planta and may act to 
mask cell-surface molecules that might otherwise elicit a defense response 
in the plant (18) . 

The expression of capsular polysaccharides is vital in the 
establishment of symbiotic relationships between bacteria and plants. 
Rhizobium meliloti, which infects leguminous plants to form nitrogen-fixing 
root nodules, produces succinoglycan, a cell-surface polymer made up of 
octasaccharide units --each composed of one galactose and seven glucose 



residues with acetyl, succinyl, and pyruvyl substitutions (45) . Mutants 
of R. meliloti that are unable to make succinoglycan can induce nodule 
formation on alfalfa but do not penetrate or colonize the nodule, which 
indicates a role for succinoglycan in nodule invasion and development (84) . 

GENETICS OF CAPSULE GENE CLUSTERS IN GRAM-NEGATIVE BACTERIA 
Capsule gene clusters have been cloned from a number of gram-negative 
bacteria, including E. coli (104, 118), H. influenzae (70), N. meningitidis 
(42), S. typhi (53), P. solanacearum (57), Klebsiella pneumoniae (3), 
Erwinia stewartii (29) , and E. amylovora (18) . In all these cases the 
capsule genes are clustered at a single chromosomal locus, which allows the 
coordinate regulation of a large number of genes that may be involved in 
the biosynthesis and export of capsular polysaccharides. The capsule gene 
clusters of E. coli, the most studied forms to date, may be regarded as a 
paradigm for capsule gene clusters in gram-negative bacteria. 

CAPSULE GENE CLUSTERS IN E. COLI 
Escherichia coli can produce over 80 chemically distinct capsular 
polysaccharides (K antigens) (87) . The K antigens of E. coli were divided 
into groups I and II on the basis of their biological and chemical 
properties (59) . The two-group classification now appears to be 
inadequate for a number of reasons. First, group I K antigens include two 
separate capsule types (see below) . Second, at the genetic level, at least 
two different groups of E. coli capsule gene clusters have recently been 
identified at the serA locus on the E. coli chromosome (96) . 



GROUP I K ANTIGENS 

Group I K antigens have higher molecular weights than those of group II and 
lower charge density, are expressed at all growth temperatures (59) , and 
produce thicker capsules (7) . They contain hexuronic acid as acidic 
components, may contain amino sugars, and are coexpressed, usually with 08 
or 09 and rarely 02 0 antigens of E. coli (62) . Group I K antigens have 
been subdivided into groups la and lb based on the presence of amino sugars 
(62) . Group la K antigens do not contain amino sugars, and they resemble 
the capsular polysaccharides of Klebsiella species. Strains expressing 
group la K antigens are unable to express cell surf ace-colanic acid or M 
antigen, which suggests that these genes may be allelic (63). In contrast, 
group lb K antigens which contain amino sugars and have no obvious 
counterpart in other bacteria are able to express colanic acid (63) . Group 
I K antigens are linked to the cell surface by lipid A core in a manner 
analogous to that of lipopolysaccharide (LPS) (60) . However, careful 
analysis of both group la (K30) and lb (K40) polysaccharides indicates 
that the situation is more complex. Analysis of the size of the 
polysaccharide attached to lipid A-core reveals differences between group 
la and lb K antigens. In the case of the K30 antigen, the polysaccharide 
linked to lipid A-core consists primarily of one repeat unit of the K30 
polysaccharide with a high molecular weight polysaccharide unlinked to 
lipid A-core (77) . This, together with the observation that in rfa 
mutants, which are unable to make lipid A-core, a K3 0 capsule can still be 
visualized by electron microscopy (77) , indicates that either the 
polysaccharide is held on the cell surface independent of any covalent 
attachment or is anchored by some other molecule apart from either lipid 
A-core or phospholipid. The presence of both lipid A-core substituted and 
unsubstituted forms of the K3 0 antigen on the cell surface may indicate 
that these two different species are exported from the cell by different 
pathways . 

In the case of the E. coli K40 antigen (group lb), the majority of the 
capsular material consists of high molecular weight polysaccharide chains 
that are linked to lipid A core; only a small fraction are not substituted 
with lipid A-core (27) . This is much more typical of an LPS molecule. 
Further evidence lends support to the notion that group lb K antigens 
should be considered as O antigens: first, the effect of the rol gene 
product on the expression of group lb K antigens. The rol gene controls 
the length of polymerization of heteropolymeric LPS 0 antigens in a range 
of Enterobacteriacae (5, 6, 81) . In the case of group lb K antigens, the 
rol gene is present on the chromosome, and multiple copies of the rol gene 
reduce the chain length of group lb K antigens in a way analogous to the 



effects on 0 antigen chain length (27) . Group la strains lack a 
detectable rol gene. Second, in different E. coli strains, chemically 
identical polysaccharides are serologically classified as either an 0 or a 
group lb K antigen (59) . On the basis of this evidence, the classification 
of group I K antigens of E. coli may be too rigid. 

The genes for the production of group la K antigens are located 
proximal to the his and near the rfb gene clusters on the E. coli 
chromosome (72, 138). The genes for group lb K antigens have been assumed 
to be located at the same region of the chromosome, but this has not yet 
been demonstrated. A second locus near the trp gene cluster has also been 
implicated in the expression of the K27 antigen (112), although the role of 
this trp-linked marker in the expression of other group la K antigens 
awaits elucidation. Part of the cps gene cluster for the production of 
colanic acid in E. coli has been cloned and analyzed (2) . To date, this 
represents the only group la capsule gene cluster that has been cloned from 
E. coli. Analysis of the sequence identified six open reading frames 
(ORFs) . The two genes cpsB and cpsG encode the enzymes mannose- 1 -phosphate 
guanyl -transferase and phosphomannomutase respectively, and together with 
the other ORFs, they are involved in the generation of GDP-mannose and 
GDP-fucose, which are component sugars of colanic add. The E. coli CpsB 
and G proteins are homologous respectively to the CpsB and G and the Rf bM 
and K proteins from the cps and rfb gene clusters of Salmonella enterica 
LT2 (2, 123) . The high G+C ratio of the cloned cps genes from both E. coli 
and S. enterica LT2 suggests that the genes have been acquired from 
another organism with a high G+C DNA composition. The changes at the third 
base position suggest that the acquisition of these genes occurreed much 
earlier in E. coli than in S. enterica LT2 , about 45 million years ago (2) . 

If, as is likely, group lb K antigen gene clusters are located at the 
his region of the E. coli chromosome, then this region of the chromosome 
must contain the group lb capsule gene cluster, the cps gene cluster for 
colanic acid, and the rfb gene cluster for the production of an 0 antigen 
together with a rol gene. Study of this region of the chromosome in group 
la and lb strains will be fascinating and will help elucidate the 
relationships between these different group I K antigen gene clusters and 
their relationships to rfb gene clusters. 

GROUP II K ANTIGENS 
Group II K antigens have a higher charge density than those of group I and 
may contain hexuronic acids, NeuNAc, or KDO as acid components (62) . They 
are coexpressed with many 0 antigens and are not expressed at growth 
temperatures below 20[degree]C (88). For certain group II K antigens, 
phosphatidic acid exists at the reducing end, and this may act as a 
membrane anchor that links the K antigen to the cell surface (59, 113) . 
In contrast to E. coli strains that express group I K antigens, strains 
that express group II K antigens have elevated levels of the enzyme CMP-KDO 
synthetase at capsule permissive temperatures, and for certain group II K 
antigens, KDO attaches to the reducing end of the polysaccharide (35, 36) . 
The genes that encode group II K antigens are termed kps and have been 
mapped near serA on the E. coli chromosome (90, 133) . 

The genes for a number of group II K antigen gene clusters have been 
cloned and analyzed. These include the genes for the Kl (118, 134), K4 
(31), K5, K7, K12, and K92 antigens (104, 106). 

Detailed molecular analysis of group II K antigen gene clusters 
revealed that different K antigen gene clusters have a conserved genetic 
organization that consists of three functional regions (Figure 1) (12, 13, 
106) . Regions 1 and 3 are conserved between different group II K antigen 
gene clusters, while the central region 2 is serotype specific (106, 107) . 
Mutations within region 2 abolish polysaccharide biosynthesis, which 
suggests that this region encodes enzymes for the synthesis of particular K 
antigens (13, 106, 116) . In the case of group II K antigens that contain 
sugars not normally ubiquitous in E. coli, additional enzymes involved in 
the biosynthesis of the appropriate nucleotide sugar precursors are also 
encoded within region 2. For instance, in the case of the Kl antigen, 
region 2 encodes three enzymes that are involved in the biosynthesis and 
activation of NeuNAc (132) . Likewise, region 2 of the K5 capsule gene 
cluster encodes a UDP-Glc dehydrogenase enzyme for the synthesis of 
UDP-Glucuronic acid (GlcA) , which is a component sugar of the K5 
polysaccharide (97, 114) . A correlation exists between the size of region 



2 and the complexity of the K antigen. For instance, the K4 antigen, which 
is a complex branched heteropolymer (108) , has a region 2 of 14 kb, in 
contrast to Kl region 2 of 5 . 8 kb (Figure 1) (11). 

The best -studied group II capsule gene clusters to date are the Kl and 
K5 capsule gene clusters; the entire nucleotide sequence is now determined 
for the K5 capsule gene cluster (97, 103) . The A+T ratio of the DNA 
within region 2 of the K5 capsule gene cluster was 66 . 6 [percent] (97), 
which is higher than the 50 [percent] A+T normally associated with E. coli 
chromosomal DNA (86) . A similarly high A+T ratio has been reported for 
region 2 of the Kl capsule gene cluster (121) . 

Region 2 of the K5 capsule gene cluster is 8.0 kb and contains four 
genes with two large intergenic spaces (Figure 2) (97) . Northern blot 
analysis and transcript mapping experiments identified three promoters 
within the K5 region 2 and revealed a complex pattern of transcription 
(97). Transcription from the kfiA promoter generates a transcript of 8.0 
kb which spans the entire region 2, including the two large intergenic 
regions between the kfiA and B genes and kfiB and C genes (Figure 2) . Two 
smaller transcripts that originate from the intergenic regions between kfiA 
and B and kfiB and C genes generate transcripts of 6.0 and 3.1 kb. (Figure 
2) (97). Region 2 of the Kl capsule gene cluster is 5.8 kb and contains 
six genes, all of which are transcribed in the same direction as region 3 
(1, 115, 132) . 

The serotype-specif ic region 2 is flanked by regions 1 and 3 (Figures 
1 and 2) . Both regions 1 and 3 are conserved between different group II K 
antigen gene clusters and encode common functions (103) . The mechanism of 
acquisition of different forms of region 2 at this site is discussed 
below. The complete nucleotide sequence has been determined for the entire 
region 1 of the K5 capsule gene cluster (93, 94) and partially determined 
for region 1 of the Kl capsule gene cluster (20, 141) . The A+T ratio of 
the DNA was 50 . 6 [percent ] , a value typical for E. coli (93) . Region 1 
contains six genes, kpsFEDUCS (Figure 2) , two of which, kpsU and kpsC, 
appear to be translationally coupled (93, 94). Translational coupling is a 
proposed mechanism by which balanced expression of two proteins may be 
achieved (144) , which may be of particular significance if the two proteins 
interact in some form of complex. 

Northern blotting and transcript mapping conf irm that region 1 is 
organized in a single transcriptional unit with the promoter located 225 bp 
upstream from the initiation codon for the first gene kpsF (Figure 2) (D 
Simpson & I Roberts, unpublished data) . Analysis of the promoter revealed 
no similarities with promoters recognized by alternative sigma factors. 
Transcription from the region 1 promoter was temperature regulated, and 
there is no detectable transcription at 18 [degree] C, a nonpermissive 
temperature for capsule expression. 

Region 3 of group II capsule gene clusters contains two genes kpsM and 
T that are organized as a single transcriptional unit (Figure 2) with both 
genes translationally coupled (92, 119) . This coupling suggests, as 
mentioned above, that the KpsM and T proteins are likely to interact in 
some form of complex. The promoter for the K5 region 3 is 741 bp upstream 
of the initiation codon for KpsM and has no similarities with promoters 
recognized by alternative sigma factors in E. coli (M Stevens & I Roberts, 
unpublished data) . Because of the high degree of identity between region 3 
from the Kl and K5 capsule gene clusters, the promoter is probably located 
at a similar site in the Kl capsule gene cluster. 

Transcription of region 3 is in the same direction as transcription in 
region 2 in both the Kl and K5 capsule gene clusters (97, 115) , 
Transcription from the region 3 promoter was temperature regulated in a 
manner analogous to that of the region 1 promoter (M Stevens & I Roberts, 
unpublished data) . 

GROUP III K ANTIGENS 
Recently, the genes for a third group of E. coli capsule gene clusters have 
been cloned (96). This group, typified by the K10 and K54 antigens, is 
encoded by genes that map to the same serA site on the chromosome as do 
group II capsule genes (89, 90) . Preliminary genetic analysis of the K10 
and K54 capsule gene clusters (96) suggests that group III capsule gene 
clusters have a conserved genetic organization that contains a central 
serotype-specif ic region flanked by group III capsule-specific sequences in 
a manner analogous to that of group II capsule gene clusters (96) . 
However, the group III capsule genes appear to have little detectable 



nucleotide sequence in common with the group II capsule genes, which 
suggests that these capsule gene clusters may have originated from a 
different source than that of the group II capsule genes, but have been 
inserted at the same serA site on the E. coli chromosome (96) . 

ANALYSIS OF CAPSULE GENE CLUSTERS FROM OTHER GRAM-NEGATIVE BACTERIA 
The genes have been cloned for a number of both group I -like and group 
II-like capsules from a range of gram-negative bacteria. The genes for the 
production of group I -like capsules have been cloned from E. amylovora, 
Erwinia stewartii, P. solanacearum, and K. pneumoniae (3, 18, 29, 57). In 
these cases the genes are clustered. Both the E. amylovora and P. 
solanacearum capsule gene clusters appear to be transcribed as single large 
transcripts, although other minor promoters may be present within the gene 
clusters and their roles cannot be ruled out (18, 57) . 

The genes for the production of group II-like capsules from 
Haemophilus influenzae type b, Neisseria meningitidis group B, and the 
Salmonella typhi Vi antigen have been cloned (42, 53, 70) . The 
organization of these capsule gene clusters is in many ways remarkably 
similar to that of the group II capsule gene clusters of E. coli. The H. 
influenzae and N. meningitidis capsule gene clusters are aligned with the 
E. coli K5 cluster (Figure 2) . 

The H. influenzae type b capsule gene (cap) cluster has a conserved 
genetic organization (Figure 2) . A central serotype-specif ic region 2 
encodes the polysaccharide biosynthetic functions (128) and is flanked by 
regions 1 and 3, which are common to all of the H. influenzae serotypes 
(68, 70) . Region 1 of the type b cap cluster contains four genes bexABCD 
that are probably organized as a single transcriptional unit (68) . The 
nucleotide sequence of region 2 contains four genes, one of which encodes 
CDP-ribitolpyrophosphorylase, an enzyme required for the biosynthesis of 
the type b polysaccharide (128) . The cap locus is duplicated in most type 
b strains: Two directly repeated copies of 17 kb are linked by a bridge 
region that contains the single unique copy of the bexA gene, which is the 
last gene in the cap gene cluster (69, 67) . The mechanisms by which the H. 
influenzae cap locus may be lost or its copy number increased are 
discussed below. 

The capsule gene cluster (cps) from N. meningitidis group B has a 
complex segmental organization (Figure 2) . Originally, five regions were 
identified within the cps gene cluster (42) . Region A was thought to be 
involved in polysaccharide biosynthesis, regions C and D in polysaccharide 
transport, and regions D and E in regulation of capsule expression (42) . 
Further experimental work has redefined this preliminary hypothesis. 
Region C contains four genes ctrABCD that are most likely organized in a 
single transcriptional unit (39) while region B contains two genes termed 
lipAB (Figure 2) , originally thought to be organized as two transcriptional 
units (40) . However, careful analysis of the nucleotide sequence indicates 
that the lipA gene is larger than first thought, and therefore the two 
genes are likely to be transcribed as a single unit (I Roberts, unpublished 
results) . Four genes were identified within region A (Figure 2) that code 
for proteins involved in the biosynthesis of the group B polysaccharide 
(33, 43). Further analysis of the nucleotide sequence of region D allowed 
identification of the galE gene and the three genes rfbBCD (51) . The 
rfbBCD genes encode proteins homologous to those involved in the 
biosynthesis of rhamnose, which contains LPS molecules in Salmonella 
typhimurium (51) . The absence of any detectable rhamnose -containing 
polysaccharides in meningococci and the incomplete expression of all of the 
rfb genes are puzzling. When the galE gene was expressed, mutations in 
the gene resulted in truncated lipooligosaccharide (LOS) that was not 
sialylated (51) . The lack of any LOS or LPS genes within the H. 
influenzae and E. coli capsule gene clusters suggests that, in the case of 
the N. meningitidis group B capsule gene cluster, the insertion of the LOS 
genes occurs by some form of chromosomal rearrangement within meningococci. 

THE BIOCHEMISTRY OF CAPSULE PRODUCTION IN GRAM-NEGATIVE BACTERIA 
The biochemistry of E. coli group II capsule production has been studied 
exhaustively (35, 62, 126, 139) . A combination of both biochemical and 
genetic techniques has revealed much about the mechanisms by which group II 
polysaccharides are synthesized and exported in E . coli (103) . 



THE EXPORT OF GROUP II POLYSACCHARIDES 
Although fundamental differences may exist between the export of group 
II-like capsules in E. coli, H. influenzae, and N. meningitidis, uniform 
themes are conserved in the expression of polysaccharide capsules in these 
taxonomically diverse bacteria. It is unclear whether this reflects a 
common ancestry of capsule biosynthesis genes (41) or is instead the result 
of underlying chemical constraints that limit the diversity of the process. 
As a consequence, much of what has been learned about the export of group 
II capsules in E. coli can be extrapolated to capsule expression in these 
other bacteria. 

The functions performed by the proteins encoded by regions 1 and 3 in 
the export of group II capsular polysaccharides in E. coli are only 
partially elucidated. Mutations within region 3 result in cytoplasmic 
polysaccharide associated with the inner face of the cytoplasmic membrane 
(13, 71, 91, 99, 119) . This suggests that the proteins encoded by region 3 
are involved in the export of group II polysaccharides across the 
cytoplasmic membrane, and it confirms that polysaccharide biosynthesis 
occurs on the inner face of the cytoplasmic membrane. Analysis of the 
predicted amino acid sequence of the KpsM and T proteins indicates that 
they are ATP -binding cassette ( ABC ) transporters and may comprise an 
inner -membrane polysaccharide export system (91, 92, 119). 

ABC transporters are ubiquitous , and they are involved in a 
diverse range of import and export systems in both prokaryotes and 
eukaryotes (28) . These include bacterial transport systems for the uptake 
of oligopeptides and amino acids (54) as well as for the export of 
polysaccharides (34) . Members of the ABC superfamily have a common 
organization that consists of a hydrophobic integral membrane protein and a 
hydrophi lie -membrane associated ATP-binding protein (28) . The transport 
complex consists of two of each component, and in some cases the different 
components are fused into a single multi -domain protein (54) . 

In the case of E. coli group II polysaccharide export, the KpsT 
protein has an ATP-binding fold, and a hydrophobicity plot of the predicted 
amino acid sequence of KpsM reveals a protein with at least six potential 
membrane -spanning domains (92, 119). Site-directed mutagenesis and binding 
of azidolinked ATP have confirmed the role of the KpsT protein as an 
ATP -hydroly zing protein (91, 115, 117) . Structure-function studies on the 
KpsM protein confirm that its membrane topology has six membrane -spanning 
domains with two cytoplasmic and three periplasmic loops (99, 115) . 
Therefore, the group II polysaccharide transport complex is likely to 
consist of two molecules of KpsM that may form some type of inner 
membrane -spanning pore, associated with two molecules of KpsT that catalyze 
ATP and energize the transport process. Mutations in region 3 of the Kl 
capsule gene cluster could be complemented in trans by the cloned region 3 
of the K5 capsule genes, which indicates that the KpsM and T proteins are 
able to export group B polysaccharides that are independent of the repeat 
structure of the polysaccharide (106) . 

The homology of the KpsM and T proteins and the proteins that are 
encoded by the H. influenzae cap cluster and N. meningitidis cps cluster 
(Table 2) suggests that ABC transport systems operate to export 
capsular polysaccharides across the cytoplasmic membrane in these bacteria 
(39, 69) . Mutations in region 3 of the K5 capsule gene cluster are 
complemented by cloned genes from Actinobacillus pleuropneumoniae (13 7) , 
which confirms the functional conservation that appears to exist in the 
ABC transport systems involved in the export of group II polysaccharides 
in gram-negative bacteria. 

The functional conservation in the export of capsular polysaccharides 
in gram-negative bacteria has been confirmed by computer-aided database 
searches. These searches of the predicted amino acid sequence of the six 
proteins that are encoded by region 1 revealed similarities to proteins 
that are involved in the expression of polysaccharide capsules in other 
bacteria (Table 2) . The KpsF protein has no homologue in other capsule 
gene clusters but has significant similarity to the GutQ protein (Table 2) , 
a hypothetical protein ORF3 2 8 of E. coli, and a KpsF homologue in H. 
influenzae (D Simpson & I Roberts, unpublished data) . The GutQ protein may 
act as a regulator of the glucitol operon in E . coli (143)/ thus, the KpsF 
protein may have a regulatory role in the expression of group II capsules 
in E . coli (20). However, mutants lacking a functional kpsF gene are still 



able to express a K5 capsule, and they demonstrate temperature -dependent 
capsule expression (14, 93, 94). In order to preclude the possibility that 
the mutation in the kpsF gene was complemented in trans by the gutQ gene, 
the experiments were repeated in a gutQ mutant. In this case, a 
temperature -dependent expression of capsule genes was still observed (D 
Simpson & I Roberts, unpublished data) . Therefore, the role of the KpsF 
protein is still unknown, but the high degree of identity (95 [percent] ) 
between the KpsF proteins encoded by the Kl and K5 capsule gene clusters 
would suggest some functional role for the KpsF protein. The situation is 
not made any clearer by the identification of a KpsF homologue in H. 
influenzae because the corresponding kpsF gene is not located with the H. 
influenzae cap gene cluster (D Simpson & I Roberts, unpublished data) . 

Mutations within the kpsE and kpsD genes result in periplasmic 
polysaccharide (13, 14), which suggests a role for these two proteins in 
the export of polysaccharide across the periplasmic space. The KpsE 
protein is homologous to the BexC protein of H. influenzae and the CtrB 
protein of N. meningitidis (Table 2) , both of which are implicated in the 
export of capsular polysaccharides in these two bacteria (39, 68) . Rosenow 
et al have purified the KpsE protein encoded by the K5 capsule gene cluster 
and generated antisera to the protein (110) . Both biochemical and genetic 
techniques allow elucidation of the topology of the KpsE protein within the 
inner-membrane (110) . Analysis of KpsE-BlaM fusions demonstrates that the 
KpsE protein has a cytoplasmic N-terminus with a large periplasmic domain 
of approximately 300 amino acids and a C-terminal membrane -associated 
domain that is unlikely to extend across the inner -membrane into the 
cytoplasm (F Esumeh & I Roberts, unpublished data) . This arrangement of 
the KpsE protein within the inner-membrane suggests that the periplasmic 
domain of the KpsE protein may be functionally important in the export of 
group II polysaccharides. Based on predicted amino acid sequence, both the 
BexC and CtrB proteins are likely to have similar topologies within the 
inner -membrane . 

The KpsD protein is a periplasmic protein with a typical N-terminal 
signal sequence (93, 94, 141) . Mutations in the kpsD gene result in 
periplasmic polysaccharides, which suggests a role for KpsD in the passage 
of group II polysaccharides across the periplasmic space following export 
across the cytoplasmic membrane (14, 141) . The KpsD protein likely 
interacts with the periplasmic domain of the KpsE protein when mediating 
the export of group II polysaccharides across the periplasmic space. 
However, a homologue to the KpsD protein is not encoded by either the H. 
influenzae or N. meningitidis capsule gene clusters. The KpsD protein is 
homologous to the ExoF protein (Table 2), which is implicated in the 
expression of succinoglycan in R. meliloti (83) . In the case of E. coli 
group II capsular polysaccharides, the means of export of polysaccharide 
across the outer membrane is not clear. No outer -membrane protein is 
encoded by the kps gene cluster, unlike the capsule gene clusters of H. 
influenzae and N. meningitidis, which both encode an outer-membrane protein 
that may play a role in the final stages of capsular polysaccharide export 
(41, 68) . The lack of an outer -membrane protein encoded by the kps gene 
cluster means either that another outer -membrane protein may perform this 
function, or that export of capsular polysaccharide in E. coli onto the 
cell surface requires no outer -membrane protein and is achieved in a 
fashion different from that in H. influenzae and N. meningitidis. E. coli 
capsular polysaccharide export onto the cell surface may be achieved by the 
formation of membrane fusions or Bayer sites between the inner and outer 
membrane (103) . While this suggestion is appealing because of the possible 
cycling of phospholipid linked polysaccharide through the fused membranes, 
little experimental evidence exists at this stage to support this notion. 

The KpsC and KpsS proteins are located in the cytoplasm associated 
with the inner face of the cytoplasmic membrane (G Rigg & I Roberts) . 
Mutations within either the kpsC or kpsS genes result in aggregates of 
polysaccharide that accumulate within the cytoplasm and lack both 
phospholipid and KDO, which are normally found at the reducing end of 
cell-surface polysaccharide (14, 15) . Therefore the KpsC and KpsS proteins 
may be involved in the attachment of KDO to phospholipid and the subsequent 
ligation of the polysaccharide to the phosphatidyl -KDO prior to export 
across the cytoplasmic membrane by KpsM and KpsT (103) . Such an 
interpretation suggests that the presence of phosphatidyl -KDO at the 
reducing end of the polysaccharide molecule is the motif recognized by the 
export proteins, which might explain how a conserved set of proteins could 



export a broad range of chemically diverse group II polysaccharides. 
Indeed, a similar lipidation function has been suggested for the LipA and 
LipB proteins encoded by the N. meningitidis capsule gene cluster (40) , 
which are homologous to KpsC and KpsS respectively (Table 2) . However, to 
date, no biochemical evidence for the function of either KpsC or KpsS 
proteins is available. Furthermore, KDO has not been demonstrated to be at 
the reducing terminus of all group II polysaccharides in E. coli. 

The KpsC protein is also homologous to LpsZ (Table 2) (93) . This 
protein is responsible for modification of LPS molecules in R. meliloti 
(17) . The functional significance of this similarity is unclear, 
especially in conjunction with evidence that the cloned IpsZ gene does not 
complement a kpsC mutant (I Roberts, unpublished data) . 

The KpsU protein is 44 [percent] identical to the KdsB enzyme of E. 
coli (93, 94) . The KdsB enzyme is the CMP-KDO synthetase enzyme that 
catalyzes the formation of CMP-KDO prior to the linkage of KDO to lipid A 
(46) . Rosenow et al purified the KpsU protein and demonstrated it to be a 
CMP-KDO synthetase enzyme (109) , which explains the elevated levels of this 
enzyme in E. coli strains expressing group II capsules (36) . The KpsU 
enzyme probably generates CMP-KDO, which is a substrate for the attachment 
of KDO to phospholipid (103) . Mutations in the kpsU gene of the K5 
capsule gene cluster do not abolish capsule production (14) , most likely 
owing to complementation of the defect by the kdsB gene. 

Mutations in any of the genes in regions 1 or 3 result in a 
significant reduction in the membrane transferase activity (15, 91) . Such 
a disruption of polysaccharide export affects polysaccharide biosynthesis, 
indicating that the two are linked. The proteins involved in 
polysaccharide synthesis and export presumably form a multi -protein complex 
on and through the inner-membrane with many protein-protein interactions 
(103). A multi-protein complex for the E. coli K5 antigen does exist on 
the inner-membrane, and mutations in particular region 1 genes have 
pronounced pleiotropic effects on the assembly of this complex (G Rigg & I 
Roberts, unpublished data) . 

The processes involved in the export of group II polysaccharides in E. 
coli are beginning to be clarified by research and by extrapolation in 
other gramnegative bacteria. Despite common themes, export processes may 
differ significantly: for instance, in the apparent lack of an 
outer-membrane protein in the transport process in E. coli, in the role of 
KDO in the transport process, and in the lack of an obvious 
capsule-associated periplasmic protein in both H. influenzae and N. 
meningitidis . 

GROUP II POLYSACCHARIDE BIOSYNTHESIS 
The biosynthesis of the E. coli Kl and K5 antigens and the group B capsule 
of N. meningitidis have been extensively studied (35, 62, 78, 79, 121, 126, 
139). In the case of the E. coli Kl and K5 antigens, the polysaccharide 
grows at its nonreducing end by the sequential addition of monosaccharide 
units (35, 120, 139) . The biosynthesis of NeuNAc is achieved by the 
conversion of N-acetylmannosamine-6 -phosphate to N-acetylmannosamine 
(ManNAc) , which is probably catalyzed by the NeuC protein (13 6) . The NeuB 
protein then catalyzes the condensation of ManNAc with phosphoenolpyruvate 
to generate NeuNAc (1) , which is activated to form CMP -NeuNAc by the NeuA 
enzyme (14 5) . The polymerization of the Kl antigen is then achieved by 
NeuS, a processive polysialyltransf erase enzyme (120) . Functions for the 
NeuD and NeuE proteins encoded by region 2 of the Kl capsule gene cluster 
are less well known. Mutations in the neuD gene abolish Kl production, and 
NeuD possibly modifies other proteins involved in Kl biosynthesis (1) . 
Mutations in the neuE gene result in intracellular Kl polysaccharide (135) . 
NeuE may play a role in the assembly of the polymerization-export complex, 
and the ability of NeuE to bind to undecaprenol phosphate may be essential 
for this process (126) . Whatever the function of NeuE, it is likely to be 
specific to E. coli group II capsules that contain NeuNAc. 

The steps involved in the biosynthesis of the group B capsule in N. 
meningitidis are similar to the steps of Kl, with one or two key 
differences. The biosynthesis of NeuNAc and its activation to CMP-NeuNAc 
appear to proceed via an identical pathway, and significant similarities 
exist between the meningococcal and E. coli NeuA, NeuB, and NeuC proteins 
(43) . However, no homologues of either the E. coli NeuE or NeuD proteins 
have been identified in N. meningitidis group B (33) . In addition, the 
polysialyltransf erases are 30 [percent] identical (39), but the 



meningococcal polysialyltransf erase appears to be the only enzyme required 
for the initiation and elongation of polysaccharide biosynthesis of the 
group B polysaccharide (33) . Different enzymes in N. meningitidis group B 
and E. coli Kl regulate the availability of NeuNAc . In meningococci, a 
CMP-NeuNAc hydrolase cleaves CMP-NeuNAc (78) . The gene that encodes this 
enzyme is not present in the group B capsule gene cluster but maps 
elsewhere on the meningococcal chromosome (33) . E. coli Kl lacks this 
enzyme activity, and the levels of NeuNAc are regulated by a NeuNAc 
aldolase, an enzyme that is also not encoded within the Kl capsule gene 
cluster (136) . 

The biosynthesis of the E. coli K5 antigen requires four proteins, 
KfiA-D, encoded within region 2 (Figure 1) (97) . The functions of two of 
the encoded proteins were confirmed by over-expression of individual genes 
and subsequent assays of the recombinant proteins (97) . The KfiC protein 
is a bifunctional glycosyltransf erase enzyme that adds alternating 
glucuronic acid and N-acetylglucosamine residues at the nonreducing end of 
the growing polysaccharide chain (97) . The KfiC enzyme is unable to 
initiate K5 biosynthesis itself, which suggests that the initial reactions 
for K5 biosynthesis are mediated by other enzymes encoded within region 2. 
Western blot analysis with antibodies to the purified KfiC enzyme 
confirms the enzyme's location on the inner face of the cytoplasmic 
membrane and reveals that association of the KfiC enzyme with the 
cytoplasmic membrane requires proteins encoded by regions 1 and 3 (G Rigg, 
G Griffiths, & I Roberts, unpublished data) . This confirms the notion that 
a polysaccharide biosynthetic complex exists on the inner face of the 
cytoplasmic membrane. Alignment of the predicted amino acid sequence of 
KfiC with other known glycosyltransf erase enzymes reveals three conserved 
regions that are likely to be functionally important in projective 
glycosyltransf erase enzymes (97) . The KfiD protein has been demonstrated 
to be a UDP-glucose dehydrogenase that converts UDP-glucose to UDP 
glucuronic acid, which is a component sugar of the K5 polysaccharide (114) . 

The functions of the KfiA and B proteins are so far unknown. 
Preliminary evidence suggests a role for these enzymes in the initial 
stages of K5 biosynthesis: possibly in the attachment of the first sugar 
residue to a membrane acceptor onto which the K5 polysaccharide is then 
synthesized by the KfiC enzyme (103) . The nature of the membrane acceptor 
for the initiation reaction is still unknown. Whatever the function of 
both the KfiA and B proteins, they appear to be associated with the 
polysaccharide biosynthetic complex on the inner face of the cytoplasmic 
membrane (G Rigg & I Roberts, unpublished data) . 

BIOSYNTHESIS OF GROUP I POLYSACCHARIDES 
The biochemistry of group I polysaccharide biosynthesis in E. coli is less 
well understood. By extrapolation from the biosynthesis of group I -like 
capsules from other bacteria, the biosynthesis is considered to involve two 
steps: first, the generation of undecaprenol- linked intermediates, and 
second, the polymerization of these intermediates at the reducing terminus 
to form polysaccharide. 

The biosynthesis of the capsular polysaccharide of Aerobacter 
aerogenes strain DD45 has been studied in detail (127) . In this case, the 
tetrasaccharide repeat unit of the polysaccharide is synthesized on an 
undecaprenol phosphate carrier. Polymerization then occurs by the 
formation of a glycosidic bond and the transfer of one repeat unit to 
another, which is itself linked to undecaprenol phosphate, thereby forming 
an undecaprenol phosphate -linked molecule with two repeat units attached. 
The freed undecaprenol pyrophosphate is then dephosphorylated, which allows 
a cycling of the undecaprenol phosphate (127) . This dephosphorylation 
reaction is inhibited by bacitracin (124) , and the sensitivity of capsular 
polysaccharide biosynthesis to bacitracin indicates a role for the cycling 
of undecaprenol phosphate in capsule production. The biosynthesis of a 
number of extracellular polysaccharide molecules, including xanthan gum of 
Xanthomonas campestris (58) , also proceeds via lipid linked 
oligosaccharides . 

Colanic acid biosynthesis in E. coli is believed to be achieved by a 
similar mechanism using undecaprenol phosphate (140) . The similarity 
between E. coli group la capsules and the capsules of Klebsiella suggests 
that group la capsules are synthesized in the same fashion. That group lb 
capsules may be regarded as acidic O antigens (27, 59) suggests the 



following biosynthetic process: Group lb polysaccharide polymerization 
might occur at the reducing terminus using blocks of undecaprenol- linked 
repeating units, as observed with heteropolymexic 0 antigens (140) . In 
the next few years, the biosynthesis of group I capsules should become 
clearer. 

THE GENETICS OF CAPSULE PRODUCTION IN GRAM -POSITIVE BACTERIA 
Capsule gene clusters have been cloned from a number of gram-positive 
bacteria, including Streptococcus pneumoniae (26, 44, 49), Staphylococcus 
aureus (73, 75), and group B streptococci (111). Because the biochemistry 
and export of these polysaccharides are as yet only poorly understood, I 
focus on the genetics of capsule production in these organisms. 

The gene clusters for the S. pneumoniae type 3 and 19F capsules have 
been cloned and analyzed (26, 44, 49). To date, seven genes have been 
identified within the type 3 capsule gene cluster (19, 25, 26) . Four of 
these genes, cps3DSUM, are specific for the type 3 capsule. The genes are 
organized into two transcriptional units, cps3DS and cps3UM (25) . This 
capsule-specific region is flanked by two regions common to pneumococci 
that express other capsule serotypes . Based on predicted amino acid 
sequence homology, possible functions were assigned to the type-specific 
proteins: Cps3D was assigned a UDP glucose dehydrogenase; Cps3S, a 
polysaccharide synthase; Cps3U, a glucose-1 phosphate uridylytransf erase ; 
and Cps3M, a phosphoglucomutase (25) . Mutations within the cps3U or cps3M 
genes do not abolish capsule production, which suggests that the functions 
performed by these proteins can be complemented by other enzymes in the 
cell (19) . Immediately upstream of the cps3D gene is a 938-bp segment that 
encodes a small ORF and contains sequences that are repeated in the 
chromosomes of S. pneumoniae strains, expressing types 2, 3, 5, 6B, 8, and 
22 (19) . This region is not thought to play a role in the expression of 
the type 3 capsule but may be important in the acquisition of capsule gene 
clusters in S. pneumoniae (25). Three other genes, cps3BCP, have been 
identified upstream of the repeated sequence and are homologous to the 
cpsfBCD genes of the type 19F capsule gene cluster (19) . Downstream of 
the cps3M gene are common sequences that in the case of the type 3 capsule 
contain the two genes tnpA and plpA (19, 25). However, both of these genes 
have undergone deletions and encode nonfunctional proteins (19) . The TnpA 
protein is homologous to a number of transposases , but it lacks 200 amino 
acids from its N-terminal end and 490 amino acids from its C-terminus. The 
PlpA protein is a permease-like protein that is important in transformation 
in S. pneumoniae (95). In the case of the plpA gene that flanks the type 
3 capsule genes, the PlpA protein lacks the first 281 amino acids and is 
therefore defective. Polymerase chain reaction (PCR) has been used to 
analyze the sequences downstream of the type 3 capsule gene clusters in 
unrelated pneumococci that express type 3 capsules. Such analysis has made 
it possible to confirm the presence of defective tnpA and plpA genes 
downstream of all type 3 capsule gene clusters, which indicates that all 
type 3 strains are plpA minus (19) . Recent data suggest that both CpsM and 
PlpA may play important roles in sensing the environment and causing 
up-regulation of capsule expression in S. pneumoniae (J Yother, unpublished 
results) . The PlpA system may operate in non-type 3-strains, and the 
Cps3M-mediated pathway may be in operation in type 3 strains that lack a 
functional PlpA (J Yother, unpublished results) . 

Analysis of the type 19F capsule gene cluster revealed the presence of 
seven genes, cpsl9FA-G, organized in a single transcriptional unit (49) . 
The Cpsl9FA protein is homologous to the transcriptional activator LytR, 
which suggests a possible role for this protein in the regulation of 
capsule expression. The cpsl9FA gene appears to be conserved in all of the 
serotypes analyzed, which suggests a general role for this protein in 
pneumococcal capsule expression (49) . The function of the Cpsl9FB protein 
is not clear, but the protein is conserved in all of the serotypes 
analyzed. The Cpsl9FC and D proteins are homologous to the Cps3C and P 
proteins, and these proteins may be involved in the export of pneumococcal 
capsular polysaccharides (49) . Comparison of the gene clusters for the 
type 3 and 19F capsules shows that conserved genes exist between the 
different capsule gene clusters, and some form of genetic organization is 
conserved. However, the presence of additional DNA in the type 3 capsule 
gene cluster and differences in the transcriptional organization in genes 
that are conserved between different serotypes suggest that the conditions 



in S. pneumoniae may be more complex than in E. coli. This complexity may 
reflect the ease by which genetic information can be taken up and exchanged 
within pneumococci. 

The genes for the production of both the type 1 and type 5 capsule 
serotypes have been cloned from Staphylococcus aureus (73, 75). The type 1 
capsule gene cluster is located on a large discrete genetic element of 
approximately 34 kb and appears specific to strains that express the type 1 
capsule (74) . This is in contrast to the type 5 capsule gene cluster. 
Southern blot experiments using probes that spanned the type 5 capsule 
gene cluster identified sequences common to all capsule types, as well as 
capsule-specific sequences reminiscent of group II capsule gene clusters in 
E. coli (75) . Nucleotide sequence data, when they become available, should 
resolve the apparent differences between the organization of the type 1 and 
type 5 capsule gene clusters of S. aureus. 

THE GENERATION OF CAPSULE DIVERSITY 
The huge number of chemically different capsular polysaccharides prompts 
the question: How has this diversity been achieved? In the case of 
gram-negative bacteria, the high A+T ratio of the DNA that encodes the 
polysaccharide biosynthesis enzymes (33, 43, 97, 121, 128) suggests a 
common ancestry of these capsule biosynthesis genes (41) . In group II E. 
coli capsule gene clusters, the high A+T ratio of region 2 DNA, as compared 
to regions 1 and 3, would confirm that group II capsule diversity has been 
achieved in part through the acquisition of different region 2 sequences. 
The lack of any conserved sequences of different group II capsule gene 
clusters between regions 1 and 2 and regions 2 and 3 may preclude some form 
of site-specific transposition event as a means of changing region 2 
sequences (30) . Rather, the acquisition of new region 2 sequences may 
occur through homologous recombination of an incoming and resident capsule 
gene cluster between the flanking regions 1 and 3 . The marked divergence 
of the C-termini of the KpsS and KpsT proteins from different group II 
capsule gene clusters (30, 91, 97) would support this hypothesis. The 3' 
ends of the kpsS and kpsT genes are located, respectively, adjacent to the 
junctions of regions 1 and 2 and regions 2 and 3; therefore, these 
differences in the C-termini of the proteins may have arisen as a result 
of recombination events between regions 1 and 3 of different capsule gene 
clusters. The mechanism by which group II capsule genes have been acquired 
and why this set of different capsule genes is inserted at the same serA 
region of the chromosome are unknown. 

The acquisition of regions 2 by homologous recombination may also be 
responsible for the generation of capsule diversity in both H. influenzae 
and N. meningitidis. However, in both these bacteria, additional 
chromosomal rearrangements are likely to be important in the expression of 
capsular polysaccharides. In N. meningitidis group B, expression of group 
B polysaccharide can be switched on and off by the insertion/excision of 
sequence IS1301 in the neuA gene (52) . The inactivation of the neuA gene 
abolishes the production of CMP-NeuNAc, thereby abolishing both capsule 
production and sialylation of LOS. Loss of capsule will likely promote 
adherence and invasion of epithelial cells (52) . The subsequent 
restoration of capsule production and LOS sialylation by the spontaneous 
excision of IS13 01 from the neuA gene would permit the survival of the 
invading meningococci, by conferring resistance to complement -mediated 
killing and phagocytosis. 

In H. influenzae type b, the duplicated cap locus lies between direct 
repeats of a IS1016 and essentially generates a compound transposon that 
contains the capsule gene cluster (69) . The presence of the IS1016 
elements allows the amplification of the cap locus, thereby augmenting type 
b capsule production. At one end of the duplicated cap gene cluster, there 
is a 1.2-kb deletion that removes most of one copy of the bexA gene. The 
remaining functional bexA gene is located in the bridge region that links 
the two copies of the cap gene cluster so that directly repeated capsule 
genes are necessary for capsule expression (67, 69) . This arrangement of 
the cap locus is preserved among nearly all H. influenzae type b strains 
responsible for the vast majority of invasive Haemophilus infections 
worldwide. Such an observation suggests that this deletion and cap gene 
duplication occurred in an ancestral type b strain and generated some form 
of selective advantage (69) . 

In S. pneumoniae, capsule type changes as a consequence of 



transformation with donor DNA from a different capsule type. Homologous 
recombination between conserved sequences that flank the serotype- specif ic 
region serves as a likely model. As a rare consequence of transformation, 
two capsule types (binary encapsulation) may be produced by a single 
transformant (4). In the case of unstable binary encapsulation, the second 
capsule gene cluster is linked to the first, suggesting some form of 
illegitimate recombination event in the flanking sequences that is 
eventually resolved with the end result of a single capsule type (25) . 
Stable binary transf ormants have the second capsule gene cluster inserted 
at a second distal site on the chromosome (8) . Although a number of 
possibilities exist, the basis for this stable acquisition of a second 
capsule gene cluster is not known. Hopefully, with a greater understanding 
of the molecular genetics of capsule gene clusters in S. pneumoniae, it may 
be possible to address this question. 

PERSPECTIVES 

Bacterial capsular polysaccharides are a diverse range of biologically 
important molecules. They play pivotal roles in mediating a number of 
biological processes, particularly in affecting microbe-host interactions 
during the onset and development of infectious disease. The use of 
molecular genetic techniques has moved forward our understanding of the 
organization of capsule gene clusters in a wide variety of bacteria and 
begun to shed light on the process of capsule diversity. The comparison of 
predicted amino acid sequences of encoded proteins illustrates that 
conserved themes, such as ABC -polysaccharide transport systems, are 
present in diverse bacteria. However, there are still huge areas for which 
at the molecular level our knowledge is at best slight. This is 
particularly true for the export of polysaccharides in gram-negative 
bacteria. It seems to me that we have to think of novel biochemical 
approaches to begin to answer these questions. The prize is not only the 
knowledge itself; rather, it is the production of chemotherapeutic agents 
targeted to selectively disrupt capsule export and thus combat infections 
by encapsulated bacteria. The extension of our studies on capsule 
production to eukaryotic pathogens of human beings promises to be exciting. 

Added material 
Ian S. Roberts 

School of Biological Siences, University of Manchester, Manchester M13 
9PT, United Kingdom 

Table 1 Functions of polysaccharide capsules 

Table 2 Homology between proteins encoded by regions 1 and 3 of the K5 
capsule gene cluster 

Protein Cellular location Similarity 
KpsF Cytoplasm GutQ (72 [percent] over 314 aa) 

KpsE Inner membrane BexC (73 [percent] over 359 aa) 

CtrB (73 [percent] over 355 aa) 
KpsD Periplasm ExoF (67 [percent] over 100 aa) 

KpsU Cytoplasm KdsB (70 [percent] over 246 aa) 

KpsC Associated with the inner LpsZ (76 [percent] over 312 aa) 

face of the cytoplasmic membrane LipA (70 [percent] over 550 aa) 
KpsS Cytoplasm LipB (68 [percent] over 3 96 aa) 

NeuA (66 [percent] over 246 aa) 
KpsM Inner membrane BexB (69 [percent] over 266 aa) 

CtrC (68 [percent] over 266 aa) 
KpsT Associated with the inner BexA ( 89 . 4 [percent] over 220 

aa) 

face of the cytoplasmic membrane CtrD (85 [percent] over 220 aa) 



Figure 1 Schematic representation of the organization of E. coli group 
II capsule gene clusters. The Kl capsule gene cluster is shown with the 
three functional regions. The boxes labeled K92 , K5 , and K4 represent the 
serotype-specif ic region 2s that are inserted between the conserved regions 
1 and 3. 

Figure 2 Genetic organization of the E. coli K5 , H. influenzae type b, 
and N. meningitidis group B capsule gene clusters. For clarity, a single 
copy of the H. influenzae capsule gene cluster is shown. The large boxes 
denote the conserved functional regions (see text) . The smaller labeled 
boxes define specific genes that have been identified within each cluster. 



The hatched boxes in region 2 of the K5 capsule gene cluster define 
intergenic gaps . 
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ABSTRACT: Adherence to a surface is a key element for colonization of the 
human oral cavity by the more than 500 bacterial taxa recorded from oral 
samples. Three surfaces are available: teeth, epithelial mucosa, and the 
nascent surface created as each new bacterial cell binds to existing dental 
plaque. Oral bacteria exhibit specificity for their respective colonization 
sites. Such specificity is directed by adhesin- receptor cognate pairs on 
genetically distinct cells. Colonization is successful when adherent cells 
grow and metabolically participate in the oral bacterial community. The 
potential roles of adherence-relevant molecules are discussed in the 
context of the dynamic nature of the oral econiche. Reprinted by permission 
of the publisher. 

TEXT: 

INTRODUCTION 

Adherence mechanisms of oral bacteria are essential to bacterial 
colonization of the oral cavity. In their absence, bacteria become part of 
the salivary flow and are swallowed. Consequently, oral bacteria have 
evolved several mechanisms to fulfill this role. The mechanisms are highly 
specific in that the oral cavity is colonized principally by bacteria that 
are only found in the oral cavity. For example, bacteria of the intestinal 
flora are found in the oral cavity in very low numbers, if at all. 
Likewise, oral bacteria are found infrequently in other ecosystems. They 
can, however, cause infections in other regions of the body, and the oral 
cavity is often considered a reservoir for these infectious bacteria. 
Although humans have ample opportunity for exchange of bacteria among us, 
we tend to maintain our own personal flora even after antibiotic therapy 
(153) . 

Three principle surfaces are available for oral bacterial 
colonization. Both teeth and mucosal epithelial cells are coated with 
bacteria. The third surface is the nascent bacterial layer that 
constitutes dental plaque on teeth and the initial coating on epithelial 
cells. Indeed, bacteria recognize and bind to other kinds of bacteria in 
suspension and on substrata. Accretion of bacteria onto a surface forms a 
nascent layer, while adherence of bacteria in suspension forms mixed 
cell--type coaggregates . Coaggregation was first reported by Gibbons & 
Nygaard in 1970 (66) and is defined as the recognition and adhesion between 
genetically distinct bacteria (105) . Suspended coaggregates may also 
accrete and contribute to the formation of dental plaque (190) . 

This review includes a discussion of the developments in 
coaggregation; it was first reviewed in 1988 (102) . Adhesins and 
complementary receptors are described, and other forms of oral bacterial 
adherence such as binding to teeth and epithelial cells are coordinated 
within the framework of an oral microbiological econiche. Finally, a newly 
developing area of invasion of host cells by oral bacteria is reviewed. 

Several other reviews of adherence of oral bacteria should be 
consulted for a complete discussion of this topic. The field of oral 
bacterial adherence is very active and has expanded considerably in the 
last seven years. For additional discussions on specific topics, the 
reader is referred to some excellent reviews on oral bacterial invasion 
(57) , streptococcal receptors and molecular modeling (21) , streptococcal 
adherence mechanisms (76, 87, 88) , Streptococcus mutans group (16) , ecology 
of coaggregations (118, 119), surface structures of oral bacteria (75), and 
saliva-bacterium interactions (181) . 

SURFACES, OCCUPANTS, AND MULTIPLE MECHANISMS OF ADHERENCE 
More than 50 0 bacterial taxa have been recorded in samples taken from the 



human oral cavity, and about 300 have been named (153) . The most prevalent 
oral bacteria are members of the 22 genera listed in Table 1. Although 13 
genera are gram-negative, a single gram-positive (Actinomyces naeslundii) 
and a gram-negative (Fusobacterium nucleatum) species predominate in dental 
plaque under nearly all conditions of health or disease (153) . Surveys 
have been conducted that encompassed at least 1000 strains representing 
these genera. Strains from each genus coaggregate with a specific set of 
partner strains, usually from a different genus. Mixing a pair of partners 
results in clumps or coaggregates composed of an interacting network of the 
two cell types. By contrast, simple cellular agglutination or aggregation 
refers to interactions between cells that are genetically identical. Most 
and probably all oral bacteria coaggregate with at least one partner cell 
type. The first report, or in some cases the first extensive report, of 
coaggregations involving members of each genus is listed. Numerous 
additional studies have been conducted for most of the genera, and those 
reports are discussed in the various sections of this review. 

While coaggregations have been studied since 197 0, invasion of 
epithelial cells by oral bacteria has been examined only recently and was 
first reported in the early 1990s for suspected periodontal pathogens 
Actinobacillus actinomycetemcomitans (152) and Porphyromonas gingivalis 

(127) . A third potential periodontopathogen, Treponema denticola, may not 
actually invade cells but rather insert its chymotrypsin-like proteinase 

(205) . 

Each oral bacterium has the capability of several mechanisms for 
adherence; two examples follow. P. gingivalis strains possess the armament 
to coaggregate (112) , bind to saliva-coated hydroxyapatite (SHA) (27) , 
hemagglutinate (24) , and adhere to and invade epithelial cells (123) . This 
bacterium also binds to several matrix molecules, like fibronectin (130) , 
fibrinogen (131), and collagen (159), and produces proteases that may 
promote adherence (5) . Streptococcus gordonii PK488 adheres to SHA and 
coaggregates with A. naeslundii PK606 (10, 109, 110) , other streptococci 
(113), and fusobacteria (112) by different mechanisms. Mutants of strain 
PK488 that fail to coaggregate with PK606 retain the lactose-inhibitable 
coaggregations with streptococci and the lactose-noninhibitable 
coaggregations with fusobacteria, and they bind to SHA. Thus, multiple 
adhesins on a given cell are likely to mediate distinct interactions with 
different surfaces, which can be animate or inanimate. 

POPULATION COMPOSITION CHANGES 
Left undisturbed, our teeth accrete a thick layer of plaque, a coating of 
bacteria mixed with salivary and serum molecules. Immediately following 
professional teeth cleaning, a thin host-derived layer called the acquired 
pellicle covers the tooth surface. The pellicle consists of numerous 
components, including mucins, glycoproteins, proline-rich proteins, 
histidine-rich proteins, enzymes like alpha-amylase , phosphate -containing 
proteins like statherin, and other molecules. The acquired pellicle 
coating appears to mask the contribution of the substratum surface, for 
amalgam, hydroxyapatite, and titanium were colonized in vivo by the same 
kinds of bacteria (134) . Bacteria rapidly accrete and in turn become an 
available surface for adherence by other bacteria. Bacteria are immersed 
in salivary and serum molecules that they can utilize as nutrients (39, 
202) . This combination of recognition, accretion, and growth leads to 
successful colonization (16a, 190, 191) . 



TEMPORAL RELATIONSHIP OF SUBGINGIVAL OCCUPANTS AND COAGGREGAT I ON 
PARTNERSHIPS 

An organism may be able to adhere but unable to grow in a particular lotic 
or flowing environment. When the environment becomes more favorable, the 
organism may proliferate and constitute a major portion of the biofilm. 
Consider a possible scene of repopulation of a tooth surface after 
cleaning. The acquired pellicle covers the tooth. Streptococci, the 
principle early colonizers, bind to acidic proline-rich-proteins (65, 81) 
and other receptors like alpha-amylase (182, 183) and sialic acid (44, 81) 
in the acquired pellicle. Streptococci also participate in intrageneric 
coaggregation (81, 113, 124), which offers an extra advantage in allowing 
them to bind to the nascent monolayer of already bound streptococci (165, 
191) . In addition, actinomyces, which are other primary colonizers, bind 



to the acquired pellicle (64) and to the streptococci {see 102, 103, 115, 
118) . 

Each streptococcal and actinomyces strain binds specific salivary 
molecules. Thus, from a common pool of salivary molecules, each strain of 
early colonizer may be coated with distinct molecules. Identical cells 
coated with a specific salivary molecule may agglutinate, which would lead 
to a microconcentration and juxtapositioning of a particular strain. 
Alternatively, growth of a particular accreted strain would lead to a 
microcolony coated with specific salivary molecules. Such events could 
dramatically alter the diversity of salivary molecules exposed to later 
colonizers. In this way, early colonizers may dictate small adjustments in 
the temporal accretion of later colonizers. 

Both streptococci and actinomyces are facultatively anaerobic, and 
doubling times for microbial populations during the first four hours of 
plaque development are less than one hour (209) . Consequently, these two 
groups of primary colonizers are thought to prepare the environment for 
later colonizers that have more fastidious requirements for growth and that 
grow more slowly (209) . Other bacteria like f usobacteria, veillonellae , 
propionibacteria, rothiae, capnocytophagae, and prevotellae bind to 
streptococci and/or actinomyces (see 118) . Each new accreted cell becomes 
a nascent surface and therefore becomes a coaggregation bridge to the next 
potentially accreting cell type that passes by. The accreted cells 
metabolize and the econiche changes. Cells that could not survive the 
originally aerobic environment before are now capable of colonizing the 
anaerobic plaque, but they require attachment points. We have proposed 
that a principal one is the anaerobic fusobacteria, which as a group 
coaggregates with all oral bacteria. Accordingly, the fusobacteria act as 
bridges to anchor the environmentally more-fastidious late colonizers. 

The evidence for in vivo implementation is based on the fact that 
early colonizers coaggregate among each other and with fusobacteria. Late 
colonizers coaggregate with fusobacteria but do so infrequently among 
themselves (see 119 for further discussion) . Some interactions between 
late colonizers have been reported (71, 218) . In a study of Sudanese adult 
periodontitis patients, Prevotella intermedia was never found unless F. 
nucleatum was also present (6) . Late colonizers like P. gingivalis and T. 
denticola are found in close association in vivo (96, 188) . Nutritional 
communication with metabolic end products has been reported between T. 
denticola and P. gingivalis (72) . Considering the kinds of bacteria 
successful in initial adherence to the acquired pellicle, the highly 
specific coaggregation partnerships, and the temporal colonization profiles 
of bacteria reveal that all are essential for participation in development 
of stable dental plaque. Other factors are also essential for successful 
colonization. For example, metabolic communication among the community and 
evasion of host defenses play active roles. 

Fusobacteria are not numerous in the initial stages of colonization of 
the tooth surface, but they increase to about 50 [percent] of the numbers of 
either actinomyces or streptococci in normal plaque. They can bind to 
statherin (64) , a component of the acquired pellicle, but their ability to 
coaggregate with members of all the genera makes them unusual (112) . Each 
Fusobacterium nucleatum strain has its own set of partners, which includes 
intrageneric coaggregation with other fusobacteria (12 0) . Considering 
fusobacteria in total, they coaggregate with all other human oral bacteria 
so far tested (97, 98, 108, 112, 120). Many of these coaggregations are 
lactose-inhibitable (97, 108, 120), which adds to the already extensive 
number of lactose-inhibitable coaggregations among other early colonizers 
(118) . The thought of using a rinse of lactose to disrupt dental plaque 
should be dismissed, because there are an equal or greater number of 
coaggregations that are insensitive to lactose. Secondly, lactose, like 
sucrose, is a metabolizable sugar that leads to plaque growth and acid 
production rather than inhibition of plaque formation. The diversity of 
oral bacteria and their multiple adherence mechanisms leads one to think of 
dental plaque as a dynamic biofilm. Daily oral hygiene disrupts the 
bacterial community, which reforms daily with readily available nutrients 
both from food intake and host-derived salivary and serous molecules. 

SALIVARY PROTEINS/ PELLICLE RECEPTORS 
The bacteria that are present in the oral cavity are in constant contact 
with saliva. As soon as an organism enters the mouth, it becomes coated 



with specific salivary proteins, which can increase the adhesion of the 
bacteria to oral tissues. As well as providing receptors for adhesion, 
saliva can aggregate bacteria (41) and may aid in their removal. Oral 
bacteria recognize different receptors within the salivary pellicle (41) . 
Many experiments are done by coating hydroxy apatite, a model for the tooth 
surface with saliva or specific saliva components. 

Actinomyces, porphyromonads , and streptococci bind to acidic 
proline-rich proteins (PRP's) attached to hydroxyapatite . In in vitro 
assays, the type 1 fimbriae expressed by Actinomyces viscosus interact with 
several proline-rich salivary molecules coating either hydroxyapatite or 
polystyrene (31) . Immobilized proline-rich glycoproteins (PRG) have been 
shown to promote adhesion of some oral bacteria to hydroxyapatite, e.g. A. 
viscosus (31), F. nucleatum (67), and S. gordonii (136). Multiple salivary 
components, including the low molecular-weight salivary mucin and highly 
glycosylated proline-rich glycoproteins (156), a-amylase (156, 183) and 
proline-rich peptides (65), promote adhesion of streptococci. 

Several species of oral streptococci including S. gordonii bind 
salivary a-amylase (183), a plaque constituent. Only animals exhibiting 
salivary amylase activity in their saliva harbor amylase -binding 
streptococci. Amylase -binding bacteria are numerous in supragingival 
plaque, suggesting that this enzyme may serve as a pellicle receptor. 

Fimbriae are composed of structural subunits called fimbrillin and are 
required for the attachment of P. gingivalis to whole saliva-coated 
hydroxyapatite beads (133) . P. gingivalis fimbriae are important in 
virulence, for they possibly mediate initial adherence and colonization of 
the oral cavity. Recombinant P. gingivalis fimbrillin protein (r-Fim) was 
able to bind to SHA (186) . PRP1 and statherin were found to significantly 
enhance the binding of r-Fim to hydroxyapatite. The C-terminal region of 
the fimbrillin subunit of fimbriae appears to be responsible for this 
binding. These results suggest that P. gingivalis fimbriae bind strongly 
through protein-protein interactions to PRPs and statherin molecules bound 
to surfaces . 

Different streptococcal species preferentially colonize different oral 
sites and coadhere to a varied range of bacteria, so it is possible that 
each of the various oral streptococci have evolved unique adhesins. Hence, 
there appear to be homologous adhesins present in streptococci, but their 
functions have been honed so that they cooperate, not compete, with each 
other for binding sites. Mucin-like salivary glycoprotein interacts with 
many species of oral streptococi and may function in regulating 
streptococcal colonization of oral tissues. This interaction may induce a 
nonimmunologic mechanism for clearing these organisms from the oral cavity. 
Alternatively, salivary agglutinin that is bound to oral tissue surfaces 
may promote streptococcal adherence to these tissues (18, 101) . Among the 
oral streptococci, much research is conducted on the mutansgroup 
streptococci because of their cariogenic potential. Streptococcus mutans 
secrete glucosyltransf erase , which enzymatically converts sucrose to 
glucans, and both the enzyme and glucan are integral parts of the acquired 
pellicle. Adsorbing glucosyltransf erase to SHA and incubating in the 
presence of sucrose promotes the formation of glucans, which can function 
as specific binding sites for S. mutans (187) . 

STREPTOCOCCAL ADHERENCE 
INTERACTIONS 

During the transition from colonization predominantly by streptococci and 
actinomyces in the first few hours to later colonizing genera, a vast array 
of surface molecules are presented to the environment. As each new cell 
type adheres, its cell body becomes a nascent surface presented to the 
environment. In this way, dental plaque quickly presents numerous possible 
receptors and adhesins available for specific recognition among different 
strains. Such interactions are presumed to occur by some of the same 
molecules as those that mediate coaggregations . This idea is shown 
diagrammatically in Figure 1. The four topics discussed are intrageneric 
coaggregation, intergeneric coaggregation, multigeneric coaggregation, and 
accretion . 

Figure 1 is not intended to be comprehensive but rather illustrative 
of possible interactions of streptococci with their changing environment. 
The mediation of interactions represented by identical symbols is the 



simplest interpretation of the available data. Several streptococci are 
named and shown, but the focus of this discussion is on S . gordonii PK488, 
which coaggregates with A. naeslundii PK606 (109) . This intergeneric 
coaggregation is mediated by a putative adhesin (obelisk with stem) on the 
streptococcus that recognizes a complementary receptor (obelisk) on the 
actinomyces . 

In Figure 1, several types of coaggregations are shown to be mediated 
by the same receptor (the solid black rectangle) . For example, 
intrageneric coaggregation between S. gordonii PK488 and either S. oralis 
34 or Streptococcus SM PK509 is depicted as mediated by these surface 
components. COG- mutants of S. gordonii PK488, selected by their failure 
to coaggregate with one of the streptococci (represented by solid black 
rectangle) , lose their ability to coaggregate with the other streptococcus 
but retain it with A. naeslundii PK606 (represented [cont . on p. 522] by 
obelisk). Likewise, S. gordonii DL1 and Streptococcus sanguis 12 (both are 
depicted in Figure 1 directly above S. gordonii PK488) also coaggregate 
with both Streptococcus SM PK509 and S. oralis 34 in indistinguishable 
lactoseinhibitable coaggregations. Although no mutant of Streptococcus SM 
PK509 has been isolated, it is expected that one selected by its failure to 
coaggregate with S. gordonii PK488 [altered in the receptor (solid black 
rectangle)] would also be unable to coaggregate with either S. gordonii DL1 
or S. sanguis 12. Such COG- mutants of S. oralis 34 have been isolated, and 
they predictably lost their ability to coaggregate with S. gordonii DL1 and 
S. sanguis 12, but retained coaggregation with Streptococcus SM PK509. 

S. gordonii DL1 COG- mutants selected on the basis of inability to 
coaggregate with S. oralis 34, which is a streptococcal strain similar to 
S. oralis C104, are also unable to coaggregate with S. oralis C104 and 
Streptococcus SM PK509, but they retain the ability to coaggregate with A. 
naeslundii PK606 (33) . Several spontaneous mutants were tested, and each 
failed to express a 100 -kDa surface protein (32) . S. gordonii PK4 88 also 
expresses a 100-kDa surface protein that reacts with an antiserum that 
recognizes the 100-kDa protein on S. gordonii DL1 (32) . This antiserum 
blocks coaggregation between S. gordonii DL1 and S. oralis 34, S. oralis 
C104, or Streptococcus SM PK509, but it has no effect on the intergeneric 
coaggregation between S. gordonii DL1 and A. naeslundii PK606 (32) . These 
results suggest that many intrageneric interactions may be functionally 
similar and are mediated by structurally similar molecules on the 
coaggregation partners. 

The intergeneric coaggregations between S. gordonii DL1 and either A. 
naeslundii PK606 (Figure 1, upper right) or Propionibacterium acnes PK93 
(Figure 1, lower right) are also shown to be mediated by the (solid black 
rectangles) . This feature distinguishes the adhesins (rectangles with 
stem) on the three streptococci. Of these three streptococci, only S. 
gordonii DL1 exhibits lactose-inhibitable coaggregation with A. naeslundii 
PK606 and P. acnes PK93 . Thus, the lactose-sensitive adhesin on S. 
gordonii DL1 has a broader specificity than the adhesin on the other two 
streptococci. Spontaneous COG- mutants of S. gordonii DL1 that were 
selected on the basis of failing to coaggregate with streptococci also lost 
the lactose-inhibitable coaggregations with A. naeslundii PK606 and P. 
acnes PK93 (33) . These data suggest that there is only one putative 
lactose-sensitive adhesin on S. gordonii DL1 . Although the receptor (solid 
black rectangle) on S. oralis 34 and Streptococcus SM PK509 may be 
recognized by all three streptococci, (S. sanguis 12, S. gordonii DL1, and 
S. gordonii PK4 88) , the receptors (solid black rectangles) on the other 
partners (A. naeslundii PK606, and P. acnes PK93) are not likely to be 
identical to the homologous S. oralis 34 and Streptococcus SM PK509 
receptors . 

The fourth mode of adherence shown in Figure 1 is accretion to the 
tooth surface (lower left) . Four streptococcal strains are shown, and all 
express an adhesin (semicircle with stem) that recognizes a receptor 
(shaded ellipse) . All strains are known to bind to saliva-coated 
hydroxyapatite (26, 53, 59, 60, 110, 167). It is not known whether all 
four bind to the same or different receptors of the acquired pellicle, but 
all four express the putative adhesin ScaA or a homolog (10, 110) . The 
putative adhesin (semicircle with stem) is discussed in detail below. The 
molecular nature of the receptor (s) is unknown. Streptococcus parasanguis 
FW213 does not coaggregate with other streptococci or with A. naeslundii 
PK606 (110) , so it is depicted in Figure 1 with a single adhesin; whereas 
the other three streptococci are shown with three kinds, each recognizing a 



distinct receptor. Several sophisticated chemical analyses of 
streptococcal receptors have been reported recently (1-3 , 20, 69, 170, 171) 
and are discussed below. Colonization of S. gordonii DL1 (Challis) in the 
oral cavity of mice occurred in a lactobacillus-f ree mouse but not in a 
conventional mouse (13 9) . 
ADHESINS 

Most known oral bacterial adhesins are from streptococci (Table 2) . The 
subunit size ranges from 35 to 380 kDa. The adhesins are presented at the 
surface by at least two distinct mechanisms. The first includes many 
proteins in the range of 210-2 60 kDa that contain the sequence LPxTG near 
the C terminus, which may be important for proper sorting in the cell wall 
(185) . A model incorporating the peptide cleavage between the T and G of 
the LPxTG motif, in which the cleavage has similarity to the 
transpeptidation reaction of cell wall synthesis, has been proposed (161) . 
This model proposes the mechanisms whereby the C-terminal T can be linked 
to the following: the N-terminal amino acid of the peptide cross-bridge, 
the free amino group of the diaminopimelic acid, or lysine of the 
tetrapeptide attached to the carbohydrate backbone of the bacterial cell 
wall . 

The second mechanism of surface presentation is based on the fact that 
many of the proteins of 35-78 kDa contain the sequence LxxC, which is the 
consensus motif for lipoproteins (77, 78) . This motif is located about 20 
amino acids from the N-terminus of the prolipoprotein and is the site of 
cleavage by signal peptidase II, an integral membrane protein that 
catalyzes the cleavage and yields cysteine as the N-terminal amino acid. 
The cysteine become lipid-modif ied, and surface exposure of these 
lipoproteins appears to occur by anchoring the amino terminus in the 
cytoplasmic membrane and by exposure of the C-terminal region. Thus, there 
appear to be two distinct presentations: (a) peptidoglycan linkage and 
surface exposure of the N-terminal region of the adhesin, and (b) 
cytoplasmic membrane anchoring and surface exposure of the C-terminal 
region of the adhesin. [cont. on p. 527] 

The streptococcal adhesins CshA and CshB bind to actinomyces cell 
surface molecules (14 8, 149) . They are antigenically related, are produced 
by S. gordonii DL1 (Challis), and are encoded by genes at distinct 
chromosomal loci (149). CshA has four regions of interest. The precursor 
form has a 41 amino acid leader peptide, a mature peptide N-terminal 
segment of residues 42-878, 13 repeating blocks of 101 amino acids and 
three shorter blocks to comprise amino acids 879-2417, and a C-terminal 
anchor domain with a sequence LPxTG for proper anchoring to the cell wall 
(161) . In fact, CshA has four additional LPxTG sequence motifs that occur 
at the same position in their four respective repeating 101 amino acid 
blocks (149) . Insertional mutations in cshA (which encodes CshA) caused 
reduced cell-surface hydrophobicity and reduced ability to coaggregate with 
strains of A. naeslundii (148) . Mutations in cshB (which encodes CshB) had 
less effect on hydrophobicity and coaggregation. Both CshA and CshB were 
required to confer S. gordonii with the ability to colonize the murine oral 
cavity (149) . 

The exciting discovery of lipoproteins on the surface of oral 
streptococci (86, 176) has resulted in an awareness of other lipoproteins 
among putative adhesins (10, 54, 59, 86, 89, 110, 176, 177). Streptococcal 
coaggregation adhesin ScaA is a 34.7-kDa lipoprotein (110) expressed by S. 
gordonii PK488, which coaggregates with A. naeslundii PK606 (109) . It is a 
member of the lipoprotein receptor antigen (Lra) group 1 (88) . The genes 
for six of these proteins have been identified in six species (Table 2) . 
Reactive fragments were found by southern hybridization with a scaA-probe 
in 12 oral streptococci (10) . These streptococci also expressed surface 
proteins that migrated as 36-38-kDa molecular size in sodium 
dodecylsulf ate-polyacrylamide gel electrophoresis gels, and they were 
detected in immunoblots using antiserum that blocks coaggregation between 
S. gordonii PK488 and A. naeslundii PK606 (10). COG- mutants of S. 
gordonii PK488 that do not coaggregate with A. naeslundii PK606 also do not 
produce this protein (109) . 

The other five members of group 1 include FimA, which is a 3 6 -kDa 
protein associated with the fimbriae of S. parasanguis FW213, which can 
block attachment of this organism to SHA (167) . Both insertion of the 
aph-3 kanamycin resistance gene cassette into fimA and deletion of fimA of 
S. parasanguis FW213 reduced binding of the streptococci to fibrin 



monolayers and reduced the virulence of the streptococci in a rat model of 
endocarditis (19). FimA shows 87 [percent] homology to a 34.7-kDa saliva 
binding protein (SsaB) from S. sanguis 12 (60). The scaA sequence shows 
86 [percent] and 73 [percent] homology with the ssaB and fimA adhesin genes 
respectively (110) . A 37-kDa surface protein PsaA has been identified from 
Streptococcus pneumoniae, which shows 80 [percent] homology with SsaB, and 
92.3 [percent] homology with FimA (177). A 34.7-kDa surface protein EfaA 
from Enterococcus faecalis also showed between 55 -60 [percent] homology with 
this group of streptococcal proteins (142) . The nucleotide sequence of the 
homologous scbA gene of Streptococcus crista, which forms corncob 
coaggregates with f usobacteria, was reported (35) . 

No function has, so far, been assigned to scbA, psaA, and efaA. All 
six of these lipoproteins form a homologous group based on sequence. While 
the functions of some are proposed, purification of the necessary amount of 
these proteins has hampered complete characterization of their functions. 

Recently, two additional homologous proteins in quite different 
bacteria have been reported. A manganese transport protein of the 
cyanobacterium, Synechocystis 6803 is 30 [percent] identical (54 [percent] 
similar) in sequence to ScaA (14) . The nucleotide sequence of the gene 
encoding a 31-kDa, rare outer membrane protein from Treponema pallidum 
subsp. pallidum predicts repeated stretches of amphipathic beta-sheets 
typical of membrane -spanning sequences of outer membrane proteins (15) . 
The deduced Tromp 1 protein sequence is 29 [percent] identical and 
53 [percent] similar to ScaA. Evidence for homologs in such varied bacteria 
suggests that these proteins may have evolved distinct functions needed by 
their respective organisms. On the other hand, these proteins may be 
essential for a common function such as maintaining surface layers in 
proper orientation, and their loss, if not fatal, may have a causal effect 
on adherence . 

The DNA sequence of the clone containing scaA indicates that scaA is 
part of an ATP binding- -protein cassette (ABC) system found originally to 
constitute the binding-protein- -dependent transport systems of 
gram-negative enteric bacteria (9, 80, 84, 164) and the lipoprotein 
transporters of oligopeptides and sugars of gram-positive bacteria (7, 8, 
176) . The ABC system of transporters is found in bacteria and 
eukaryotes (79) and consists of three basic parts: one or two ATPases, one 
or two hydrophobic membrane proteins, and one substratespecif ic binding 
protein. In gram-negative bacteria the binding proteins are found in the 
periplasm, but in gram-positive bacteria the binding lipoproteins were 
proposed to be anchored in the cytoplasmic membrane (68) , which has been 
shown (7) . The lipoproteins of gram-positive bacteria have been recently 
reviewed (197) . They are lipid-modif ied at the N-terminus of the mature 
protein and are thus anchored to the membrane. This leaves the C-terminal 
region available for binding to its cognate receptor. 

The idea that a streptococcal lipoprotein may also be important in 
adherence functions was proposed by Jenkinson in 1992 (86) and later 
expanded by our laboratory to involve an ABC system (110) consisting of 
ScaA lipoprotein adhesin, hydrophobic membrane protein, and ATP-binding 
protein encoded by a putative ABC operon (11, 110) . Considering that 
substrate-binding lipoproteins may recognize their cognate ligand either at 
the cell surface or within the porous peptidoglycan layer, it is not 
difficult to envision that these lipoproteins may also recognize the 
identical sequence of the cognate ligand when it is part of a receptor on 
the surface of a coaggregating partner cell. In addition, calcium ions, 
which are required for coaggregation, may be bound by a lipoprotein, as 
occurs for the Mn transporter of Synechocystis 6803 (14) . 

Binding between the coaggregation mediators has been thought to occur 
externally to the peptidoglycan at the bacterial cell surfaces. The net 
negative charge of the bacterial cell surface is often proposed as a 
barrier to cell-to-cell contact phenomena. However, just as soluble 
oligopeptides and sugar substrates are able to penetrate the porous 
peptidoglycan layer, it is proposed that fibrous molecules (capsular and 
fimbrillar) on a coaggregating partner cell surface can be recognized by a 
cytoplasmic membrane -anchored lipoprotein adhesin. Even a relatively small 
molecule like the 34.7-kDa ScaA lipoprotein has 310 amino acids, and an 
alpha-helix of 21 amino acids is sufficient to span about 30 A (or 3.0 nm) 
(51) . So, the logistics of presenting the C-terminal region to a point 
external or near the cell surface does not seem insurmountable for a 
membrane -anchored lipoprotein. 



A set of three 76-78-kDa lipoproteins, called HppAGH for hexa- or 
heptapeptide permease, from S. gordonii DL1 was necessary for cell growth 
on peptides of 5-7 amino acid residues (89) . It was proposed that they 
form a binding-protein complex for uptake of primarily hexa-heptapeptides . 
Inactivation of the hppA gene caused reduced growth rate of cells, affected 
competence onset and transformation efficiency, and caused an increase in 
aminopterin resistance. These three proteins are highly homologous to the 
AmiA, AliA (PlpA) , and AliB lipoproteins of an oligopeptide permease system 
in S. pneumoniae (7, 8, 168), which also affect competence and 
transformation efficiency. Mutations in amiA and plpA (aliA) confer 
greater than 50 [percent] decrease in pneumococcal adherence to epithelial 
and endothelial cells (37) . HppA (originally SarA; see 90) was first 
reported as a protein involved in serum-induced cell aggregation, 
saliva-mediated aggregation, and coaggregation with some actinomyces 
strains . 

Coaggregations among streptococci are very common and are often 
inhibited by GalNAc (113). One of these streptococci, S. gordonii DL1, 
bears the putative adhesin that recognizes a GalNAc-containing carbohydrate 
receptor on several streptococci (32, 33) . Transposon Tn916 was used to 
insertionally inactivate coaggregation-relevant genes of S. gordonii DLL 
COG- mutants were unable to coaggregate with the streptococcal partners but 
retained the ability to coaggregate with partners belonging to other 
genera. The region flanking the Tn916 insertion was sequenced and used to 
identify a 0.5-kb EcoRI chromosomal fragment, which was cloned into pQ 
(143), an E. coli -streptococcal insertion vector. Insertion mutants showed 
altered coaggregation with streptococci, but again retained wild- type 
coaggregation properties with other genera of bacteria. DL1 expressed a 
100 -kDa surface protein that was absent in both the Tn916 and pQ insertion 
mutants (215) , as had also been observed with spontaneous COG- mutants 
(32) . This is the first report of a streptococcal adhesin mediating 
intrageneric coaggregation. 

Streptococcus gordonii G9B expresses an 80-kDa antigen complex that is 
necessary for its adherence to SHA (129, 172) . A 153-kDa protein on the 
surface of S. gordonii G9B that mediates bacterial adhesion to human 
endothelial and epithelial cells in vitro has also been identified (206) . 
The purified adhesin has glucosyltransf erase activity, is able to bind 
directly to host cells, and is also able to block subsequent adhesion of 
viable streptococci. Adhesion of S. gordonii to endothelial cells can be 
inhibited with low molecular weight dextrans and antibodies specific to 
the 153-kDa protein. GTF hydrolyzes sucrose and catalyzes the formation of 
water soluble and water insoluble glycans, which may mediate bacterial 
binding to the tooth surface. GTF synthesis is positively regulated by the 
immediately upstream gene, rgg, in S. gordonii (207) . Both rgg and the GTF 
structural gene, gtfG, are found on the same restriction fragment in 
strains of S. gordonii, S. sanguis, and S. oralis (207). However, no 
rgglike determinants were found in mutans streptococci, Streptococcus mitis 
and Streptococcus salivarius (207) . The significance of these important 
findings to colonization of the oral streptococci remains to be elucidated. 

Extensive regions of sequence homology are found among the 
polypeptides comprising a group of surface proteins referred to as I/II 
antigens, Pl-like polypeptides, and antigen B, which are called the antigen 
I/II family (Table 2) in this review. They are in the molecular size range 
of 165-210 kDa and are members comprising a conserved gene family (144) 
found in Streptococcus sobrinus, S. mutans, and S. gordonii strains. They 
are often functionally different. For example, in S. mutans KPSK2 the 
interaction of the polypeptide MSL-1 with salivary agglutinin is inhibited 
by fucose and lactose (41), but in S. gordonii the interactions of SSP5 
(44, 124) and SspA (91) with salivary agglutinin are inhibited by 
N-acetlyneuraminic acid. Homologous proteins in other oral streptococci 
not only bind to salivary agglutinins (16, 36, 40, 95, 203) but also to 
other components of the salivary pellicle (101, 132) , and they can mediate 
coaggregation with A. naeslundii (91) and P. gingivalis (125) . 

RECEPTORS 

Many viridans streptococci synthesize cell wall polysaccharides that 
consist of a strain-specific hexa- or heptasaccharide repeating unit linked 
end to end by phosphodiester bonds (Table 3) . Cisar, Bush, and their 
collaborators have purified and structurally identified several 
polysaccharides (1-4, 170) that are the likely receptors for the 



Gal/GalNAc-reactive lectins of streptococci (32, 113,[cont. on p. 532] 215), 
veillonellae (82), prevotellae (140), hemophilli (121), and those 
associated with the type 2 fimbriae on oral actinomyces (29, 30, 99) . 
Cassels and coworkers purified from S. oralis ATCC 55229 a 

rhamnose- containing polysaccharide that inhibited coaggregation between the 
streptococcus and a capnocytophaga (20, 22), an L-rhamnose-inhibitable 
coaggregation. This repeating hexasaccharide does not contain GalNAc 
residues and is not linked by phosphodiester bonds, but rather the 
phosphate is found as a glycerol phosphate substitution on an internal 
galactose moiety (69). The polysaccharide from S. mitis K103 (171) does not 
have the GalNAc b-(l --> 3) Gal or Gal b- (1 --> 3)GalNAc found in the top 
five polysaccharide sequences (Table 3, underlined disaccharide) , and 
strain K103 exhibits a different coaggregation pattern (81) . 

The GalNAc b-(l --> 3) Gal or Gal b- (1 --> 3) GalNAc disaccharides are 
part of glycocon jugate receptors found on the surface of human cells. Oral 
bacteria may evade the host secretory immune response by expressing these 
glycoconjugates . Other bacteria like actinomyces and veillonellae may mask 
their presence by surrounding their cell bodies through coaggregations with 
these streptococci. Rosettes and corncob morphological arrangements 
composed of one cell type surrounded by a different cell type are common in 
dental plaque (103, 104) . The fine specificity of oral actinomyces lectins 
observed in their adherence to immobilized glycolipids with exposed GalNAc 
b-containing glycoconjugates has been proposed to reflect the actinomyces 
tissue-specific colonization (195, 196) . In a study of actinomyces 
coaggregation with S. oralis 34 [GalNAc b- (1 --> 3) Gal -containing 
polysaccharide, Table 3] and S. mitis J22 [Gal b- (1 --> 3) 
GalNAc-containing polysaccharide, Table 3] , only a minor difference in 
specificity was noted (30) . However, a major difference was seen in the 
ability of streptococci and actinomyces to recognize these two 
disaccharides (30). Whereas the actinomyces coaggregate with both S. oralis 
34 and S. mitis J22 and exhibit only a twofold difference in inhibition by 
either disaccharide, intrageneric coaggregations between S. oralis 34 
[GalNAc b-(l --> 3) Gal -containing polysaccharide, Table 3] and S. gordonii 
DL1 (expresses 100-kDa putative adhesin; 32, 215) are 30-fold more 
inhibited by GalNAc b- (1 --> 3) Gal (30). Cisar et al (30) proposed that 
the actinomyces recognize the sides of the disaccharides that are 
structurally similar, while the streptococci recognize the side of the 
GalNAc b-(l --> 3) GAl that contains the acetamido group. Such clear 
specificity is likely to have a role in the complex microbial interactions 
that construct oral biof ilms . 

Molecular modeling of the streptococcal polysaccharides of S. oralis 
34, S. oralis J22 (also called S. mitis J22) and S. oralis ATCC 55229 
indicated that the polysaccharide can fold to form a loop initiated at the 
Galf and consisting of three or four sugar residues. The loop is 
stabilized by hydrogen bonding interactions between the nearest phosphate 
to the reducing end and the Galf residue, which provides the needed 
flexibility (21) . Within the loop is the lectin-binding site (Table 3, 
underlined disaccharides). Since GalNAc b- (1 --> 3) Gal -containing capsular 
polysaccharide seems to be the streptococcal adhesin-binding site, it is 
not surprising that streptococci, e.g. S. oralis J22 and S. oralis ATCC 
5522 9, lacking this sequence do not participate in intrageneric 
coaggregation (113). The cell wall polysaccharide of S. mitis K103 (Table 
3) contains no galactof uranose and no lectin-binding site, and its 
coaggregation properties are distinct from nearly all the other 71 strains 
of viridans streptococci examined in a recent survey (81) . 

ACTINOMYCES ADHERENCE 
The most-studied actinomyces are members of A. naeslundii (many were 
formerly called Actinomyces viscosus) (92), Actinomyces israelii, 
Actinomyces odontolyticus , and Actinomyces serovar WVA963 . As with the 
streptococci, actinomyces express multiple kinds of adherence mechanisms, 
which include f imbriae-attached proteins and protease-resistant receptors 
recognized by adhesins on cells of other genera (103) . 

Two kinds of fimbriae have been reported. Type 1 fimbriae mediate 
binding to SHA and specifically to salivary proline-rich proteins. Type 2 
fimbriae mediate lactose-inhibitable binding to other oral bacteria and to 
mammalian cells. Some actinomyces strains bear both types of fimbriae, and 



others bear only type 2. No natural isolates bearing only type 1 fimbriae 
have been reported, although mutants have been constructed that express 
only type 1 fimbriae. 

The genes encoding the fimbrial subunits of type 1 and type 2 fimbriae 
of A. naeslundii strains T14V (type 1+, type 2+) and WVU45 (type 1-, type 
2 + ) have been cloned and sequenced (43, 221-223) . Both subunits have a 
molecular size of about 60 kDa, and both contain amino terminal signal 
sequences and the cell surface binding motif LPxTG in the carboxy- terminal 
region that is characteristic of many gram-positive surface proteins (184, 
185) . The deduced amino acid sequences of A. naeslundii T14V FimA (f imA 
gene product; structural subunit of type 2 fimbriae) and that of A. 
naeslundii WVU45 FimP (fimP gene product; structural subunit of type 1 
fimbriae) are 34 [percent] identical (223). The amino acid sequence 
identity between the FimA proteins of A. naeslundii WVU45 and A. naeslundii 
T14V is 70 [percent] (222; JA Donkersloot, personal communication). 
However, it remains unclear if the FimA subunit possesses the 
lactose-sensitive adhesive function or if an accessory fimbrial protein is 
responsible (28) . 

A significant advance in studies of actinomyces adherence has been 
made by Yeung and colleagues, who reported a genetic transfer system (224) 
and used integration plasmids to generate site-specific mutations in A. 
naeslundii (220) . In a careful study of three such mutants, it was 
determined that fimP was essential for type 1 fimbrial synthesis, but that 
fimP was not sufficient for conferring bacterial adherence (220). The 3' 
region near fimP on the A. naeslundii T14V chromosome may be the site of 
the putative adhesin gene. A minor protein at the tip of the type 1 
fimbriae possibly mediates the binding of A. naeslundii T14V to salivary 
proline-rich proteins (162) . 

Actinomyces mutants missing type 2 fimbriae fail to coaggregate with 
their streptococcal partners. Our laboratory and collaborators initiated a 
study of Actinomyces serovar WVA963 strain PK1259, the reference strain of 
the actinomyces coaggregation group F (117) . This strain was chosen 
because it exhibits only coaggregation that is lactose-inhibitable with 
reference strains of three oral streptococcal coaggregation groups (102) . 
Considering that these coaggregations may all be mediated by type 2 
fimbriae, coaggregation defective (COG- ) mutants should be lacking these 
fimbriae. Although Actinomyces serovar WVA 963 is among the top 2 0 most 
frequently isolated bacteria from subgingival sites, nothing has been 
reported about its fimbrial structures. 

Two coaggregation phenotypes were observed among the mutants, and 
these are depicted diagrammatically in Figure 2. On the left side is shown 
the wildtype strain PK1259 and its coaggregations with V. atypica PK1910 
(complementary set of triangles and triangles with stem) , F. nucleatum 
PK1594 (complementary set of semicircles and semicircles with stem) , and 
the members of streptococcal coaggregation groups 3, 4, and 5 
(complementary sets of rectangles and rectangles with stems) . No 
inhibitors are known for the interactions with veillonella and 
fusobacteria. Both the wild type and COG- mutants exhibit fimbriae (10 0) . 
Type 1 and type 2 fimbriae are composed of subunits of 59 kDa and 57 kDa, 
respectively (100) . The type 2 fimbriae are depicted in Figure 2 as 
protruding downward and bearing at the tip the proposed lactosesensitive 
adhesins that mediate coaggregations with the three groups of streptococci . 
Type 1 fimbriae are represented as protruding from the top of the cell, and 
they are also present on COG- mutant strains PK2407 and PK3092, the two 
strains representing the two mutant phenotypes. All three strains bind to 
salivary proline-rich protein-coated latex beads, produce the 59-kDa 
subunit of type 1 fimbriae, and are agglutinated by antiserum against A. 
naeslundii type 1 fimbriae (100) . COG- mutant PK2407 retains coaggregation 
with F. nucleatum PK1594 but does not coaggregate with V. atypica PK1910, 
whereas COG- mutant PK3 0 92 retains coaggregation with both the 
fusobacterium and the veillonella. 

Antisera against the wild type PK1259 that have been absorbed with the 
COG- mutant PK3092 blocked coaggregations between the wild type and 
thefcont. on p. 536] three streptococcal partners (100). Surprisingly, 
antisera absorbed with COG- mutant PK2407 removed the blocking antibodies 
. Antiserum against A. naeslundii T14V type 2 fimbriae agglutinated the 
wild type and COG- mutant PK3092, but it did not agglutinate COG- mutant 
PK2407. Anti-type 2 serum reacted with a 57-kDa protein on immunoblots of 
surface proteins of the wild type and COG- mutant PK3092, but this protein 



was absent from COG- mutant PK2407 (100) . These results suggested that a 
separate protein from the type 2 fimbrial subunit was responsible for the 
lactose-inhibitable coaggregation with streptococci. Wild-type cells 
subjected to mild sonication released a 95-kDa protein that specifically 
bound to lactose-agarose beads and was eluted by lactose (100). COG- mutant 
PK3 0 92 did not have this protein. In contrast, larger amounts of the 95-kDa 
protein were released from COG- mutant PK2407 than from the wild type 
(100) . These observations are depicted in Figure 2 as improper orientation 
of putative lactose-sensitive adhesin (solid rectangles) on the PK2407 cell 
surface; the orientation is opposite to that required for proper 
functioning as a coaggregation mediator. However, since COG- mutant PK3092 
bears the 57-kDa type 2 fimbrial subunit but not the 95-kDa putative 
adhesin, it is proposed that Actinomyces serovar WVA963 PK1259 mediates 
lactose-inhibitable coaggregation with streptococci by a 95-kDa 
lactose-sensitive adhesin that is a minor protein attached to the type 2 
fimbriae . 

The lactose-inhibitable coaggregations between actinomyces adhesins 
and streptococcal cognate receptors were reported to have a much greater 
impact than nonspecific interactions in a study of actionmyces binding to 
streptococcal-coated hexadecane droplets (50) . The results also agree with 
the independent nature of the multiple kinds of possible interactions by a 
community of oral bacteria (107) . For example, lactose-inhibitable 
coaggregations among cells in dental plaque are prevented in the presence 
of lactose, but lactose-noninhibitable interactions bind the community as a 
multigeneric coaggregate. An analysis of the kinetics of accretion of 
streptococci onto actinomyces bound to the substratum in a parallel flow 
chamber indicated a 19-fold higher accretion with coaggregating bacteria 
than with a COG- mutant of the streptococcus (17) . The results are best 
explained by collisions occurring between the parallelf lowing, accreting 
streptococci and the already adherent actinomyces. 

Actinomyces naeslundii WVU45 cells have been shown to bind to human 
buccal epithelial cells by recognizing a 180-kDa salivary glycoprotein that 
was bound to the epithelial cells (12) . In a different study, both a 
13 0-kDa glycoprotein and certain gangliosides were implicated in the 
attachment of A. naeslundii WVU45 to sialidase- treated polymorphonuclear 
leukocytes (178) . This actinomyces, as well as numerous others, possesses 
potent sialidases, and the sialidase genes from the various actinomyces are 
highly conserved (219) . 

PREVOTELLA ADHERENCE 
Preyotella loescheii PK1295 bears a galactoside-specif ic adhesin that 
mediates hemagglutination of neuraminidase- treated human erythrocytes 

(212) and lactose-inhibitable coaggregations with oral streptococci (118) . 
The adhesin is a 75-kDa protein (140) that is expressed on the cell surface 
in a maximum of 400 molecules per cell (214) . The adhesin is associated 
with the distal portion of the bacterium's fimbriae. The gene plaA 

(prevotella loescheii adhesin A) (146) contains a programmed f rameshif ting 
hop (147), an infrequently found event in prokaryotes. In such events the 
ribosome reads through the interrupted sequence and realigns itself in the 
correct reading frame for translating the remainder of the message. The 
role of a f rameshif ting hop in plaA is unclear. Perhaps the hop allows a 
coupling of two functions, of which the second is yet to be discovered. P. 
loescheii PK1295 expresses a second adhesin of 45 kDa, which mediates 
lactose-noninhibitable coaggregation with A. israelii PK14 (213) . By using 

monoclonal antibodies to the 75- and 45 -kDa adhesins, it was shown by 
immunoelectronmicroscopy that essentially all cells in a population express 
both adhesins (141) . 

Earlier studies had shown that the coaggregation partners of 
prevotellae appear to be primarily actinomyces (103) , and two recent 
surveys confirmed these data and showed that the prevotella bear the 
protein coaggregation adhesin and the actinomyces bear a 
carbohydrate-containing receptor on their respective cell surface (34, 
163) . The actinomyces partners are mostly members of actinomyces 
coaggregation groups C, D, and E (102) . The prevotellae also coaggregate 
with F. nucleatum and in most of the coaggregations, the f usobacterium is 
the heat-inactivated cell type (112) . The prevotellae synthesize proteases 
and one of these synthesized by P. loescheii degraded a 75-kDa PlaA adhesin 
(23) . The authors proposed that the protease may actually aid in 



detachment of the prevotellae from the coaggregation partner cell and, 
thus, allow the prevotellae to search out a different econiche in plaque. 
This attractive idea follows the notion that bacterial community members 
possess sensory surface molecules that may couple sensing to protease 
activity, which activates detachment strategies and initiates the search 
for a more suitable econiche (104) . 

PORPHYROMONAS ADHERENCE 
P. gingivalis cells exhibit a broad capability of binding to various oral 
surfaces such as other oral bacteria (70, 97, 108, 112, 126, 135, 157, 173, 
193, 194, 218), salivary components (63, 145), f ibronect in-collagen 
complexes (160) , erythrocytes and monocytes (166) , and epithelial cells 
(25, 85). A 150-kDa surface component of P. gingivalis binds to fibrinogen 
(131) . Fibrinogen (158) and histatins (155) inhibit coaggregation between 
porphyromonads and streptococci. 

Coaggregation between P. gingivalis and S. gordonii G9B is inhibited 
by a 43-kDa salivary protein (193), but proteases rapidly degraded this 
protein. Both porphyromonads (5, 131) and streptococci (12 8, 138) produce 
proteases which may function in eliminating inhibitors. In contrast, 
surface proteases of P. gingivalis may mediate coaggregations with A. 
viscosus (135) . Considering the proposed role of proteases in inactivating 
adhesins in prevotellae (23) , the role of proteases in porphyromonads 
adherence may be different. In addition, the fimbrial gene fimA of P. 
gingivalis could be insertionally inactivated without affecting the ability 
to coaggregate with S. gordonii G9B, but adherence to SHA was reduced in 
the mutant (145) . In a different study, purified fimbriae of P. gingivalis 
bound to S. gordonii G9B and blocked the porphyromonas- streptococcus 
coaggregation (122) . The authors concluded that the fimbriae were at least 
partly responsible for coaggregation with S. gordonii G9B (122). Purified 
porphyromonas fimbriae were also shown to bind to A. naeslundii (70) . So, 
the porphyromonad fimbriae may play a role in coaggregation, but no minor 
accessory protein with an adhesin function has been reported. 

P. gingivalis also produces extracellular vesicles that bind to 
serum-, saliva-, and crevicular fluid-coated hydroxyapatite (27), mediate 
binding of streptococci to serum-coated hydroxyapatite (189) , and act as 
coaggregation bridges between the noncoaggregating Eubacterium saburreum 
and Capnocytophaga ochracea (73) . The observation that vesicles could bind 
to fibronectin, fibrinogen, collagen, and laminin (45) suggests that P. 
gingivalis may employ these vesicles to enhance their advantage in 
colonizing oral surfaces. 

FUSOBACTERIUM ADHERENCE 
As a group, strains of Fusobacterium nucleatum coaggregate with members of 
all oral bacterial genera so far examined. These include 12 genera 
surveyed in 1989 (112) and Treponema, as well as intrageneric 
coaggregations among F. periodonticum and F. nucleatum strains (12 0) . The 
first report of the extensive nature of lactose-inhibitable coaggregations 
among gram-negative partnerships indicated that, of 43 gram-negative pairs, 
22 were completely inhibited and another 14 were partially inhibited by 60 
mM lactose (112) . Further studies with F. nucleatum and Porphyromonas 
gingivalis showed equal inhibition with lactose, N-acetyl-D-galactosamine, 
and D-galactose (108) . The fusobacterium was inactivated by heat- or 
protease- treatment , but the porphyromonas cells were unaffected by these 
treatments (108) . These results were confirmed and extended by 
demonstrating that the porphyromonad was inactivated by sodium 
metaperiodate treatment (97) . Also, antiserum against a 42-kDa outer 
membrane protein of F. nucleatum blocks coaggregation with P. gingivalis 
(98) . Coaggregation-def ective mutants of F. nucleatum selected for 
inability to coaggregate with P. gingivalis also could no longer exhibit 
lactose-inhibitable coaggregation with other gram-negative partners (N 
Ganeshkumar & PE Kolenbrander , unpublished data) . Collectively, these data 
suggest that coaggregation between F. nucleatum and several gram-negative 
partners may be mediated by the same galactoside-sensitive f usobacterial 
surface component . 

F. nucleatum binds to an 89-kDa salivary proline-rich glycoprotein, 
and deglycosylation of the glycoprotein by b-galactosidase abolishes 
binding (67) . Adherence of F. nucleatum to human peripheral blood 



lymphocytes is blocked by galactosides (204) . Fibronectin is a component 
of saliva, and fusobacteria bind to epithelial cells coated with 
fibronectin (13) . This binding can be inhibited by coating the 
fusobacteria with fibronectin, laminin, and type IV collagen. A 
fusobacterial hemagglutinin aggregated 14 of 17 strains of oral 
streptococci, and the hemagglutination was inhibited by L-arginine (199) . 
A polypeptide of 3 9.5 kDa was obtained from the surface of F. nucleatum, 
and it inhibited coaggregation of the f usobacterium with streptococcus 
(94) . Antiserum against the polypeptide blocked coaggregation. In a 
survey of 33 oral bacterial strains, only F. nucleatum was a coaggregation 
partner of five Eubacterium species (62) . Arginine, histidine, lysine, and 
glycine inhibited these coaggregations . L-Arginine also inhibits the 
interaction between a hemagglutinin from F. nucleatum and oral streptococci 
(199) . Elucidation of the wide variety of options for adherence by oral 
fusobacteria helps to explain why these bacteria are among the most 
numerous bacteria in dental plaque. 

TREPONEMA ADHERENCE 
Some coaggregations between T. denticola and fusobacteria are inhibited by 
D-galactosamine (120), A survey of 22 strains of Treponema spp. , including 
all four named human oral species, showed that all coaggregated with 
selected F. nucleatum and F. periodonticum strains but not with members of 
9 other genera (120) . While the fusobacteria were inactivated by heat 
treatment , the treponemes were resistant to heating. In a separate study 
both treponemes and P. gingivalis partner cells required heating to 
eliminate coaggregation (71) . Interactions of T. denticola with P. 
gingivalis were partially inhibited by saliva and serum (218) . It is not 
surprising that T. denticola and P. gingivalis are found together in deep 
periodontal pockets (96) . Moreover, T. denticola was never found in 
periodontically affected sites unless P. gingivalis was also present (188) . 
Bacteroides forsythus and S. crista coaggregate with T. denticola, and only 
the interaction with S. crista is inhibited by saliva (218). An 
interesting model of polar adhesion of T. denticola to surfaces has been 
proposed (49) . 



VEILLONELLA ADHERENCE 
Veillonellae are metabolically poised to flourish in communities dominated 
by streptococci and actinomyces because they utilize the lactic acid end 
products produced during the growth of the latter two groups on sugars. 
The veillonellae are detectable shortly after streptococci colonize the 
sterile neonate oral cavity. Some streptococci coaggregate with 
Veillonella atypica PK1910 by lactose-inhibitable coaggregations, but a 
class of V. atypica mutants has completely lost the ability to participate 
in these coaggregations (83) . Both parent and mutant cells exhibit 
fimbriae, but only the parent produces a 45 -kDa surface protein that binds 
to streptococcal cells and to lactose-agarose beads (82) . Antiserum 
prepared against the 45 -kDa protein-bound lactose-agarose beads blocked 
veillonella-streptococcus coaggregations. It appears that the 45-kDa 
putative adhesin mediates lactose-inhibitable coaggregation and that it is 
not likely to be a structural subunit of the fimbriae. 



CAPNOCYTOPHAGA ADHERENCE 
Capnocytophaga are gram-negative bacteria associated with moderate 
periodontal lesions (153, 169). They do not possess fimbriae; their 
adhesin molecules are presumably intercalated into the outer membrane. 
Tempro et al found in their experiments that a 14 0 -kDa polypeptide from the 
outer membrane of Capnocytophaga gingivalis DR2001 was recognized by 
monoclonal antibodies that blocked coaggregation between the 
capnocytophaga and Actinomyces israelii PK16 (201) . Radio-iodinated 
monoclonal antibodies were used to calculate values of between 220 and 
280 adhesin sites per cell. The sites were arranged nonuniformly on the 
cell surface. 

In experiments by Weiss et al, a rhamnose- sensitive adhesin from C. 
ochracea ATCC 33596 was determined to be 155 kDa by using monoclonal 
antibodies that blocked coaggregation with S. oralis ATCC 55229 (see Table 
3 for structure of receptor) (211) . They proposed that the adhesin is an 



outer membrane protein. 



INVASION 

Several bacteria isolated from the oral cavity have been seen inside 
epithelial cells. These include T. denticola (49), P. gingivalis (127), 
and Actinobacillus actinomycetemcomitans (152) . Invasion is considered an 
important virulence factor, offering protection from the host immune system 
and contributing to tissue damage in a nutritionally rich environment that 
is free of competing organisms. The bacteria first attach to the 
epithelial cell membrane and then induce a series of structural and 
biochemical changes that facilitate bacterial penetration. 



TREPONEMA DENTICOLA 
Confocal microscopy and transmission electron microscopy revealed T. 
denticola fully within human gingival fibroblasts that are close to and in 
the same plane as the nucleus and wound around actin filaments (49) . Other 
studies indicate that a T. denticola cell has been detected only rarely 
intracellularly (38) . By using migrating epithelial cells, it has been 
shown that when T. denticola is added, it leads to extensive membrane 
blebbing of some solitary cells (205) . Following longer exposure to the 
bacteria, some cells show large blebs, membrane damage, and invasion of T. 
denticola . 

T. denticola produces a chymotrypsin-like outer membrane-associated 
enzyme. The purified proteinase causes membrane blebbing, perhaps by 
degrading the connection of actin to the cell membrane (2 05) . Diffusion of 
the proteinase or other toxic substances inside the epithelial cells may 
interfere with the intracellular signalling networks and lead to blebbing. 
Most of the surface cells of epithelial multilayers bind few, if any, T. 
denticola cells and have shown little blebbing as evidenced by scanning 
electron microscopy. Neighboring cells have been shown to retract on 
exposure to bacteria. Transmission electron microscopy has showed no 
evidence of intact T. denticola inside the epithelial multilayers; however, 
immunogold electron microscopy has indicated that the bacterial antigens, 
including the chymotrypsin-like proteinase, penetrate into the cell 
cytoplasm and accumulate inside the intracellular vacuoles. The 
chymotrypsin-like proteinase is associated with a 53-kDa abundant outer 
membrane treponema protein. This protein complex seems to play a role in 
the attachment and virulence of the organism. The protein mediates 
attachment to fibronectin, laminin, and fibrinogen (74) and is able to 
create unusually large pores in artificial lipid membranes (48) . 

PORPHYROMONAS GINGIVALIS 
Bacterial colonization of gingival tissue and its penetration and 
destruction are critical processes in the pathogenesis of periodontal 
disease. P. gingivalis adheres in vitro to human buccal epithelial cells 
and gingival sulcular epithelial cells. Electron microscopy demonstrated 
that P. gingivalis adheres to and internalizes within primary cultures of 
gingival epithelial cells (127) . The epithelial cells showed no visible 
damage. P. gingivalis binds to and invades multilayered gingival pocket 
epithelial cells (180) and a human oral epithelial cell line (KB) (46, 

179) . Internalized P. gingivalis is found associated directly with the 
epithelial cell cytoplasm and not encapsulated by endocytic vacuoles (123, 

180) . Lamont et al (12 3) found that invasion was significantly lower 
during the lag phase of growth. Protein synthesis is necessary, and about 
90 min is required to complete the invasion process for most of the 
bacteria. Increasing the incubation of bacteria with epithelial cells to 2 
h did not result in increased invasion. However, killing of the external 
bacteria after 90 min of incubation, followed by an additional 4 h of 
incubation, resulted in a higher recovery of internal P. gingivalis. These 
results indicate that the bacteria are dividing within the epithelial 
cells . 



ACTINOBACILLUS ACTINOMYCETEMCOMITANS 
Several mechanisms have been identified for adherence of A. 
actinomycetemcomitans to epithelial cells (56) . Extracellular amorphous 
material, fimbriae, and extracellular vesicles may all be involved (150, 



151). Certain A. act inomycetemcomi tans strains undergo a variant ^ shift 
that is associated with changes in adherence and invasion properties. 
Smooth variants invade more proficiently than rough variants. Fimbriae 
most probably function in adherence of rough variants, whereas non-fimbrial 
components (eg. vesicles) are probably involved in adherence of smooth, 
highly invasive strains (152) . Transmission electron microscopy of the 
cell monolayers has revealed A. actinomycetemcomitans within the KB cells. 
The bacteria were surrounded by an endosomal vacuole of the KB cell that 
disintegrates, releasing the organisms into the cytoplasm (192) . Invasion 
is an active process; both the KB cells and the bacteria must be 
metabolically active. Accordingly, the likelihood of invasion is much 
higher with midexponential phase cells. 



BACTERIAL COMMUNICATION 
Physical adherence of cells is communication. The recent evidence ^ for 
involvement of lipoproteins in adherence as well as in substrate binding 
suggests that surface localized binding proteins may act as communicators 
to assist or perhaps to mediate permanent cell-to-cell contact. The 
enterococcal pheromone binding lipoproteins PrgZ (174) and TraC(200) also 
illustrate adherence -relevant functions in bacterial communication. 
Furthermore, a signal peptide fragment from a staphylococcal protein, Trah, 
exhibits enterococcal pheromone activity (55) . Such examples of breadth in 
communication among microbes suggest that numerous mechanisms of 
communication exist in a biofilm like dental plaque. We have examined (PE 
Kolenbrander & EP Greenberg, unpublished data) the possibility that 
autoinducer homologs of the homoserine lactones (58) are produced by 
gram-negative oral bacteria and by cultures of unaltered dental plaque 
scrapings. No homologs were detected by three assay systems, suggesting 
that different communication signals are exchanged than those from 
homoserine lactones. 

Biofilms representing several econiches have been studied, but the 
development and use of the scanning confocal laser microscope has been a 
critical advance in our ability to monitor these biofilm communities. SCLM 
allows nondestructive experimentation and has just recently been applied to 
oral communities. The same technology has been exploited in a study of a 
biofilm that degraded a commercial herbicide (217) . The investigators 
observed highly specific patterns of intra- and intergeneric 
coaggregations . They also suggest that adaptation of a biofilm to degrade 
a recalcitrant substrate involves restructuring the community ^ and 
development of different cellular arrangements to promote efficient 
metabolic communication (217) . 

Added material 

Catherine J. Whittaker, Christiane M. Klier, and Paul E. Kolenbrander 
Laboratory of Microbial Ecology, National Institute of Dental 

Research, National Institutes of Health, Bethesda, Maryland 20892 

Table 1 Frequently isolated human oral bacteria and some adherence 

properties 

Genus 
Actinobacillus 
Actinomyces 
Bacteroides (FNa) 
Campylobacter (FNb) 
Capnocytophaga 
Corynebacterium 

(formely called Bacterionema) 
Eikenella 
Eubacterium 
Fusobacterium 
Gemella 
Haemophilus 
Lactobacillus 
Neisseria 
Peptostreptococcus 
Porphyromonas 

(formerly called Bacteroides in part) 
prevotella 



Coaggregation Invasion 
(112) (152) 
(66) 
(218) 
(118) 
(114) 
(154) 

(47) 

(62) 

(112) 

(112) 

(137) 

(216) 

(66) 

(112) 

(112) (127) 
(111) 



(formerly called Bacteroides in part) 



Propionibacterium 


(26) 


Rothia 


(106) 


Selenomonas 


(112) 


Streptococcus 


(66) 


Treponema 


(120) 


Veillonella 


(66) 



FOOTNOTES 

a Only a few species including Bacteroides forsythus remain. Most are 
reclassified as either Prevotella or Porphyromonas species. 

b Genus contains strains previously classified as Wolinella. 

c Transport of T. denticola chymotypsin-like proteinase into newly 
formed large intracellular vacuoles within the epithelial cells was 
reported. 

Table 2 Adhesins and binding proteins of human oral bacteria (FNa) 
{Table omitted} 

FOOTNOTE 

a Former names: Actinomyces naeslundii (A. viscosus) T14V; Actinomyces 
naeslundii (A. viscosus) WVU627; Porphyromonas gingivalis (Bacteroides 
gingivalis) ATCC 33277; Prevotella loescheii (Bacteroides loescheii) 
PK1295; Streptococcus crista (S. sanguis) CCS A; Streptococcus gordonii (S. 
sanguis); Streptococcus oralis ATCC 55229 (S. sanguis HI); Streptococcus 
parasanguis (S. sanguis) FW213; Veillonella parvula (V. alcalescens) VI 

Table 3 Polysaccharide receptors of viridans streptococci (FNa) 
{Table omitted} 

FOOTNOTE 

a Abbreviations: Rha, L-Rhamnose; Gal, D-Galactose; Glc, D-Glucose; GalNAc , 
N-Acetyl-D-Galactosamine; Glyc, Glycerol; p, pyranose; f, f uranose . Square 
brackets indicate repeating hexa- to octa-saccharide units. Underlined 
disaccharides indicate lectin-binding sites. 

Figure 1 Diagrammatic representation of streptococcal adherence to 
other streptococci, to other genera of oral bacteria, and to the acquired 
pellicle coating on teeth, all depicted as complementary pairs of symbols. 
Putative adhesin, obelisk with stem; complementary receptor, obelisk. The 
obelisk with stem represents a cell surface molecule known to be 
inactivated by heat (85 [degree] C/30 min) , whereas its complementary symbol 
is insensitive to the same amount of heating. Inset box at center right: 
the obelisk and ellipse represent lactose-noninhibitable interactions; the 
bottom two rectangles represent lactose-inhibitable coaggregations . The 
light gray rectangle represents coaggregation between Streptococcus SM 
PK509 and Streptococcus oralis 34 (113) , shown in the upper left corner. 
Solid black rectangles indicate coaggregation with S. gordonii PK488 (118) 
and are shown as mediators of several other coaggregations. See text for 
more information. 

Figure 2 Diagrammatic representation of the role of actinomyces type 2 
fimbriae and the putative adhesin in mediating coaggregation of Actinomyces 
serovar WVA963 strain PK1259 with oral streptococcus coaggregation groups 
3, 4, and 5. Coaggregation-def ective mutant strains PK2407 and PK3092 are 
shown, lacking type 2 fimbriae and putative adhesin, respectively. Type 1 
fimbriae are represented as zigzag lines. The type 2 fimbriae are 
represented by solid lines. Lactose-inhibitable coaggregations are 
depicted by rectangles, as in Figure 1. See text for more description. 
Also represented is the independent nature of interactions between 
Actinomyces serovar WVA963 and its coaggregation partner streptococci, 
fusobacteria, and veillonellae . 
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REVIEW 

The 97-megabase genomic sequence of the nematode Cae- 
norhabditis elegans reveals over 19,000 genes. More than 40 
percent of the predicted protein products find significant 
matches in other organisms. There is a variety of repeated 
sequences, both local and dispersed. The distinctive distribu- 
tion of some repeats and highly conserved genes provides 
evidence for a regional organization of the chromosomes. 

The genome sequence of C elegans is essentially complete. The 
sequence follows those of viruses, several bacteria, and a yeast (1, 2) 
and is the first from a multicellular organism. Some small gaps remain 
to be closed, but this will be a prolonged process without much 
biological return. It therefore now makes sense to review the project 
as a whole. 

Here, we describe the origins of the project, the reasons for 
undertaking it, and the methods that have been used, and we provide 
a brief overview of the analytical findings. The project began with the 
development of a clone-based physical map (3, 4) to facilitate the 
molecular analysis of genes, which were being discovered at an ever 
increasing pace through the study of mutants. This, in turn, initiated a 
collaboration between the C elegans Sequencing Consortium and the 
entire community of C elegans researchers (5). The resulting free 
exchange of data and the immediate release of map information (and 
later sequence) have been hallmarks of the project. The resultant cross 
correlation between physical and genetic maps is ongoing and is 
essential for achieving an increasing utility of the sequence. 

Along with the genome sequencing project, expressed sequence 
tag (EST) sequencing has been carried out. Early surveys of expressed 
sequences were conducted (6), but complementary DNA (cDNA) 
analysis has been carried out primarily by Y. Kohara (7). This group 
has contributed 67,815 ESTs from 40,379 clones, representing an 
estimated 7432 genes. This extensive information has been invaluable 
in identifying and annotating genes in the genomic sequence. Others 
also contributed the 15-kilobase (kb) mitochondrial genome sequence 
(*)■ 

Sequencing 

The preexisting physical map, on which sequencing was based, had 
been initiated by the isolation and assembly of random cosmid clones 
(with a 40-kb insert, which was the largest insert cloning system 
available at the time) with a fingerprinting method (3). At a sixfold 
redundant coverage of the genome in cosmids, nonrandom gaps 
persisted. In most cases, hybridization screening of cosmid libraries 
failed to yield bridging clones, but the newly developed yeast artificial 
chromosome (YAC) clones (9) rapidly closed most of the cosmid 
gaps. Incidentally, the YAC clones also covered almost all of the 
genome, providing a convenient tool for the rapid scanning of the 
entire genome by hybridization (4), About 20% of the genome is 
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represented only in YACs. 

By 1989, it became apparent that, with the physical map in hand, 
complete sequencing of the genome might be both feasible and 
desirable. Joint funding [from the National Institutes of Health and the 
UK Medical Research Council (MRC)] for a pilot study was arranged, 
and in 1990, the first 3-megabase (Mb) sequence was undertaken. 
Success in this venture (10, 11) resulted in full funding and the 
expansion of the two groups of the consortium in 1993. 

Sequencing began in the centers of the chromosomes, where 
cosmid coverage and the density of genetic markers are high. Cosmids 
were selected by fingerprint analysis to achieve a tiling path of 
overlapping clones (in practice, 25% overlap on average). Some 
sequencing of YACs was explored (12), but because of yeast DNA 
that contaminated preparations of YAC DNA, this approach was 
deferred in anticipation of the complete sequence of yeast, which 
enabled contaminating reads to be easily identified. The sequencing 
process (13) can be divided into two major parts: the shotgun phase, 
which is sequence acquisition from random subclones, and the fin- 
ishing phase, which is directed sequence acquisition to close any 
remaining gaps and to resolve ambiguities and low-quality areas. 
Numerous and ongoing improvements to the shotgun phase have 
increased sequencing efficiency, improved data quality, and lowered 
costs. Similarly, finishing tools have improved dramatically. None- 
theless, finishing still requires substantial manual intervention, with a 
variety of specialized techniques (14, 15). 

Restriction digests with several enzymes were performed on most 
cosmids and provided valuable checks on sequence assembly. Where 
assembly was ambiguous because of repeats, the digests were helpful 
in resolving the problem. At the start of the project, polymerase chain 
reaction (PCR) checks were conducted along the length of the se- 
quence to confirm that the assembled sequence of the bacterial clone 
was an accurate representation of the genome. These checks were 
abandoned after it became clear that failures in PCR were more 
common than discrepancies between the clone and the genome. 

When available cosmids were exhausted, we screened fosmids 
(which are similar to cosmids but are maintained at a single copy per 
cell and thus are potentially more stable) (16) and found that a third 
of the gaps were bridged in the central regions of the chromosomes 
but very few were bridged in the outer regions. We also used 
long-range PCR (17) to recover some of the central gaps. The 
remainder of the central gaps and all of the gaps in the outer regions 
were recovered by sequencing YACs. As for the cosmids, a tiling path 
of YACs was chosen, and DNA from selected clones was isolated by 
pulsed-field gel electrophoresis (18). Sequencing was performed as 
for cosmids, with suitable adaptations for the smaller amount of DNA 
that was available for making libraries. Restriction digests were 
carried out for assembly checks, but they were not as precisely 
interpretable as those for bacterial clones. At this stage, the physical 
map was consolidated and sometimes rearranged as the YAC se- 
quences confirmed or rejected the links made previously by hybrid- 
ization. The comparison of the assembled YAC sequences with the 
often extensively overlapping cosmid sequences showed few discrep- 
ancies between the two sequences. Generally, further investigation 
revealed that most discrepancies resulted from a rearrangement in the 
cosmid. It is interesting (and crucial to the success of the YAC 
sequencing) that nearly all regions of the YACs can be cloned in 
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bacteria as short fragments, although cosmid and fosmid libraries 
failed to represent these regions. 

The key step in closing sequence assemblies was to obtain sub- 
clones that bridged the gaps remaining after the shotgun phase. Often, 
gaps are spanned by the subclones used in the shotgun phase, because 
the insert length is deliberately set at two to four times the typical 
sequence read length. The introduction of plasmid clones halfway 
through the program greatly improved the coverage of inverted re- 
peats and other unusual structures. In cases where the shotgun phase 
failed to yield a spanning subclone, plasmid clones that bridged gaps 
were obtained by isolating and subcloning restriction fragments from 
cosmids. In YACs, because of their greater size and complexity, 
screening by hybridization was necessary to recover the desired 
subclone. In the most difficult cases, we have exploited very short 
insert plasmid libraries to find gap-bridging clones. PCR was used 
occasionally, but because of its tendency to yield artifacts in repeat 
regions, it has recently been used as little as possible. Once isolated, 
the gap-bridging clone was either sequenced directly or, in cases of a 
difficult secondary structure, a short insert library (SIL) was made by 
breaking the insert of the gap-bridging clone into smaller fragments 
(0.5 kb or even smaller in difficult cases), with break points inter- 
rupting the secondary structure (15). In some cases, transposon inser- 
tion has been used (19), although SILs are generally preferred as a 
first pass because of their ease of throughput. 

The 97-Mb sequence is a composite of 2527 cosmids, 257 YACs, 
113 fosmids, and 44 PCR products (20, 21). For the 12 chromosome 
ends, nine of the telomere plasmid clones provided by Wicky et al 
have been linked to the outermost YACs (22), either directly by 
sequence or by long-range PCR and sequencing, where no direct 
sequence link was found. This probably represents >99% of the 
genome sequence, on the basis of the representation in the genomic 
sequence of available EST data and of the sequence from random 
clones from a whole-genome library. 

Much of the remaining DNA likely resides in the three residual 
gaps between the telomeres and the outermost sequenced YACs 
and in two internal gaps, where no spanning YAC clone has been 
identified. One of these is known to be <450 kb, on the basis of 
Southern (DNA) analysis, but a reliable size estimate is not 
available for the other gaps. A smaller amount will be recovered 
from four smaller segments (which are spanned by YACs), where 
shotgun sequencing has not been completed. Furthermore, very 
small segments (likely to be < 1 kb each) have not been recovered 
in subclones for 139 segments. Finally, some sequence is likely to 
be missing from the large tandem repeats, which, in extreme cases, 
consist of tens of kilobases that are composed of hundreds of 
copies of a short sequence. Although most have been sized by 
restriction digestion of the cloned DNA, some segments in the 
larger YACs are of unknown size. Having established the repeat 
elements, we cannot usefully work further on them at this stage, 
because they are likely to be variable and because they do not clone 
stably; any repeat elements that prove to be important will become 
the subject of population studies in the future. 

As shown by the resolution of discrepancies resulting from match- 
es with sequence data from other sources, the error rate of almost all 
the product is <10~ 4 . In a few regions (predominantly in regions of 
extensive tandem repeats), the sequence is tagged to indicate that a 
lower standard of accuracy has been accepted. Accuracy is maintained 
by a set of criteria (23), which is followed by the finisher and by a 
final checking step that requires specialized software (24) and a visual 
inspection. None of this, however, overcomes errors in the cloning 
process. A comparison of different clones in overlapping regions and 
the resolution of discrepancies have indicated a finite error rate 
associated with cloning. For example, cosmid B0393 (GenBank ac- 
cession number Z37983) contains a deletion of a large hairpin that 
was only detected because it overlapped cosmid F17C8 (GenBank 



accession number Z35719); similarly, we detected a 400-base pair 
region that had been deleted in all M13 and PCR reads from cosmid 
F59D12 (GenBank accession number Z81558). The F59D12 deletion 
was detected by restriction digestion and was recovered in plasmids. 
However, these instances are rare enough that undetected errors are 
likely to be few; thus, the advantages of the clone-based sequence, in 
avoiding long-range confusion in assembly, more than make up for its 
occasional defects. 

Sequence Content 

Whereas the sequencing has essentially been completed, analysis and 
annotation will continue for many years, as more information and 
better sequence annotation tools become available. 

To begin the task, we subjected each completed segment to a series 
of automatic analyses to reveal possible protein (25) and transfer RNA 
(tRNA) genes (26), similarities to ESTs and other proteins (27-30), 
repeat families, and local repeats (37). The results were entered in the 
genome database "a C. elegans database" (ACEDB) (32\ which 
merges overlapping sequences to provide seamless views across clone 
boundaries and allows the periodic and automatic updating of entries. 
To integrate and reconcile the various views of the sequence, we 
reviewed all data interactively through the ACEDB annotator's graph- 
ical workbench (32), In particular, the GENEFINDER (25) predic- 
tions are confirmed or adjusted to account for protein, cDNA, and 
EST matches, repeats, and so forth, and annotation concerning puta- 
tive gene function is added, 

The interruption of the coding sequence by introns, the generation 
of alternatively spliced forms, and the relatively low gene density 
make accurate gene prediction more challenging in multicellular 
organisms than in microbial genomes. The problem is made more 
complex in C. elegans by transplicing and by the organization of as 
many as 25% of the genes into operons (33). We have used GENE- 
FINDER to identify putative coding regions and to provide an initial 
overview of gene structure. To quantitate the accuracy of gene 
identification, we compared introns that were confirmed by ESTs and 
cDNAs to those that were predicted by GENEFINDER. We found that 
92% of the predicted introns had an exact match to the experimentally 
confirmed ones and that 97% had an overlap. Identification of the 
start and stop of genes is more difficult, and errors in this process 
sometimes result in the merging of some neighboring genes and in 
the splitting of others. To refine the computer-generated gene 
structure predictions, expert annotators use any available EST and 
protein similarities, as well as genomic sequence data from the 
related nematode C. briggsae. This information can be especially 
important in establishing gene boundaries. About 40% of the 
predicted genes have a confirming EST match, but because ESTs 
are partial, they presently confirm only -15% of the total coding 
sequence. In a number of cases, ESTs have provided direct evi- 
dence of alternative splicing; these instances have been annotated 
in the sequence (34). 

The genes. The 97-Mb total sequence contains 19,099 predicted 
protein-coding genes — 16,260 of which have been interactively re- 
viewed, for an average density of 1 predicted gene per 5 kb (35). Each 
gene has an average of five introns, and 27% of the genome resides in 
predicted exons. The number of genes is about three times that found 
in yeast (2) and is about one-fifth to one-third the number predicted 
for humans. As expected from earlier estimates that were based on 
much smaller amounts of genome sequence, the number of predicted 
genes is much higher than the number of essential genes that was 
estimated from classical genetic studies (10, 36). 

Similarities to known proteins provide a glimpse of the possible 
function of the predicted genes. Approximately 42% of predicted 
protein products have distant matches (outside Nematoda); most of 
these matches contain functional information (37). Another 34% of 
predicted proteins match only other nematode proteins, but only a few 
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of these have been functionally characterized. The fraction of genes 
with informative similarities is far lower than the 70% seen for 
microbial genomes. This may reflect the smaller proportion of nem- 
atode genes that are devoted to core cellular functions (38), the 
comparative lack of knowledge of functions involved in building an 
animal, and the evolutionary divergence of nematodes from other 
animals studied extensively at the molecular level. 

We compared the available protein sets from C. elegans, Escherichia 
coli, Saccharomyces cerevisiae, and Homo sapiens to highlight qualita- 
tive differences in the predicted protein sets (39) (Fig. 1). Generally, we 
found that smaller genomes had matches to a larger fraction of their 
protein sets and larger genomes had higher numbers of matching proteins. 
As expected from evolutionary relationships, there were substantially 
more protein similarities found between C. elegans and H. sapiens than 
between any other cross-species pairwise comparison. There were also a 
substantial number of proteins common to C. elegans and E. coli that 
were not found in yeast. Similarly, C elegans lacked proteins that were 
found in both yeast and E. coli (38). 

Genes encoding proteins with distant matches (outside Nematoda) 
were more likely to have a matching EST (60%) than those without 
such matches (20%). This observation suggests that conserved genes 
are more likely to be highly expressed, perhaps reflecting a bias for 
"housekeeping" genes among the conserved set. Alternatively, genes 
lacking confirmatory matches may be more likely to be false predic- 
tions, although our analyses do not support this (40). 

We have also used the Pfam protein family database (41) to 
classify common protein domains in the nematode genome. Of the 20 
defined domains that occur most frequently (Table 1), the majority are 
implicated in intercellular communication or in transcriptional regu- 
lation. We find comparatively fewer examples of second messenger 
proteins (for example, 54 G-beta and 3 Src homology 2 domains). 
This finding supports models in which the same intracellular signaling 
pathways are used with variant receptors and transcription factors in 
different cell states. 

In addition to the protein-coding genes, the genome contains at 
least several hundred genes for noncoding RNAs. There are 659 
widely dispersed tRNA genes and at least 29 tRNA-derived pseudo- 
genes (42). Forty-four percent of the tRNA genes are found on the X 
chromosome, which contains only 20% of the total sequence. Several 
other noncoding RNA genes occur in dispersed multigene families. 



The Ul, U2, U4, U5, and U6 spliceosomal RNA genes occur in 14, 
21, 5, 12, and 20 dispersed copies, respectively; there are five 
dispersed copies of signal recognition particle RNA genes, and there 
are at least four dispersed copies of splice leader 2 (SL2) RNA genes. 
A striking feature of these dispersed gene families is their high degree 
of sequence homogeneity. For example, of the 20 U6 RNA genes, 17 
are 100% identical to each other. Either gene conversion or recent 
gene duplications may account for this homogeneity. Several of these 
RNA genes occur in the introns of protein-coding genes, which may 
indicate RNA gene transposition. In general, RNA genes in introns do 
not appear to occur preferentially in the coding orientation of the 
encompassing transcript, which indicates that these RNA genes are 
probably expressed independently. 

Other noncoding RNA genes occur in long tandem arrays. The 
ribosomal RNA genes occur solely in such an array at the end of 
chromosome I. The 5S RNA genes occur in a tandem array on 
chromosome V, with array members separated by SL1 splice leader 
RNA genes. A few other known RNA genes, such as the small 
cytoplasmic Ro-associated Y RNA and the lin-4 regulatory RNA, are 
found only once in the genome. Some RNA genes that are expected 
to be present in the genome have yet to be identified, probably 
because they are poorly conserved at both the sequence and secondary 
structure level. These include ribonuclease P RNA, telomerase RNA, 
and 100 or more small nucleolar RNA genes. 

Repetitive sequences. Some of the sequence that does not code for 
protein or RNA is undoubtedly involved in gene regulation or in the 
maintenance and movement of chromosomes. A significant fraction of 
the sequence is repetitive, as in other multicellular organisms. We 
have classified repeat sequences as either local (that is, tandem, 
inverted, or simple sequence repeats) or dispersed. 

Tandem repeats account for 2.7% of the genome and are found, on 
average, once per 3.6 kb. Inverted repeats account for 3.6% of the 
genome and are found, on average, once per 4.9 kb. Many repeat 
families are distributed nonuniformly with respect to genes and, in 
particular, are more likely to be found within introns than between 
genes. For example, although only 26% of the genome sequence is 
predicted to be intronic, it contains 51% of the tandem repeats and 
45% of the inverted repeats. The 47% of the genome sequence that is 
predicted to be intergenic contains only 49% of the tandem repeats 
and 55% of the inverted repeats. As expected, only a small percentage 




Fig. 1. Percentages of matching proteins resulting from pairwise com- 
parisons (39), The organisms and the number of proteins used in the 
analysis are shown in boxes. For S. cerevisiae (a fungus), C elegans (a 
nematode), and E. coli (a bacteria), the numbers reflect proteins that 
were predicted from an essentially complete genome sequence. The 
direction of the arrows indicates how the comparison was performed. 
Numbers that are adjacent to the arrows indicate the percentage of 
proteins that were found to match. Numbers that are underlined and in 
bold-faced type indicate the percentage of C elegans proteins that were 
found to match each of the other organisms. 



Table 1. The 20 most common protein domains in C elegans (41). RRM, RNA 
recognition motif; RBD, RNA binding domain; RNP, ribonuclear protein motif; 
UDP ( uridine 5'-diphosphate, 



Number 


Description 


650 


7 TM chemoreceptor 


410 


Eukaryotic protein kinase domain 


240 


Zinc finger, C4 type (two domains) 


170 


Collagen 


140 


7 TM receptor (rhodopsin family) 


130 


Zinc finger, C2H2 type 


120 


Lectin C-type domain short and long forms 


100 


RNA recognition motif (RRM, RBD, or RNP domain) 


90 


Zinc finger, C3HC4 type (RING finger) 


90 


Protein-tyrosine phosphatase 


90 


Ankyrin repeat 


90 


WD domain, G-beta repeats 


80 


Homeobox domain 


80 


Neurotransmitter-gated ion channel 


80 


Cytochrome P450 


80 


Heticases conserved C-terminal domain 


80 


Alcohol/other dehydrogenases, short-chain type 


70 


UDP-glucoronosyl and UDP-glucosyl transferases 


70 


EGF-like domain 


70 


Immunoglobulin superfamily 



2014 



11 DECEMBER 1998 VOL 282 SCIENCE www.sciencemag.org 



C. ELEGANS: SEQUENCE TO BIOLOGY 

of the tandem repeats overlaps with the 27% of the genome encoding 
proteins. 

Although local repeat structures are often unique in the genome, 
others come in families. For example, repeat sequence CeRep26 is 
the tandemly occurring hexamer repeat TTAGGC, which is seen at 
multiple sites that are internal to the chromosomes in addition to 
the telomeres (22). CeRep26 and CeRep27 are excluded from 
introns, whereas other repeat families show a slight positive bias 
toward introns. The reason for the biased distribution of these 
repeats is unclear. Furthermore, some repeat families show a 
chromosome-specific bias in representation. For example, 
CeRepl 1, with 71 1 copies distributed over the autosomes, has only 
one copy located on the X chromosome. 

Altogether, we have recognized 38 dispersed repeat families. Most 
of these dispersed repeats are associated with transposition in some 
form (43) and include the previously described known transposons of 
C. elegans. However, these repeat elements may not explicitly encode 
an active transposon (44). For example, we have found four new 
families of the Tel /mariner type, but these are highly divergent from 
each other and the other family members; they are probably no longer 
active in the genome. 

In addition to multicopy repeat families, we observe a substan- 
tial amount of simple duplication of sequence, that is, segments 
ranging from hundreds of bases to tens of kilobases that have been 
copied in the genome. In one case, a segment of 108 kb containing 
six genes is duplicated tandemly with only 10 sites observed to be 
different between the two copies. At the left end of chromosome 
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IV, immediately adjacent to the telomere, an inverted repeat is 
present where each copy of the repeat is 23.5 kb, with only eight 
different sites found between the two copies. Many cases of shorter 
duplications are found, which are often separated by tens of 
kilobases or more that may also contain a coding sequence. These 
duplications could provide a mechanism for copy divergence and 
the subsequent formation of new genes. In one example, two 
2,5-kb segments, separated by 200 kb, were found to contain genes 
exhibiting a 98% sequence identity (C38C10.4 and F22B7.5). EST 
data indicate that both genes are expressed. More commonly, gene 
duplications are local. In a search for local clusters of duplicated 
genes, 402 clusters were found distributed throughout the genome 
(Fig. 2). 

Chromosome organization. At first sight, the genome looks re- 
markably uniform; GC content (36%) is essentially unchanged across 
all the chromosomes, unlike the GC content in vertebrate genomes, 
such as human, or yeast (45). There are no localized centromeres as 
found in most other metazoa. Instead, the extensive, highly repetitive 
sequences that are characteristic of centromeres in other organisms 
may be represented by some of the many tandem repeats found 
scattered among the genes, particularly on the chromosome arms. 
Gene density is also fairly constant across the chromosomes, although 
some differences are apparent, particularly between the centers of the 
autosomes, the autosome arms, and the X chromosome (Table 2 and 
Fig. 3). 

Striking differences become evident after an examination of 
other features. Both inverted and tandem repetitive sequences are 
more frequent on the autosome arms (Fig. 3) than in the central 
regions of the chromosomes or on the X chromosome. For exam- 
ple, CeRep26 is virtually excluded from the centers of the auto- 
somes (Fig. 3). (The abundance of repeats on the arms is likely to 
be a contributing factor to the difficulties in cosmid cloning and 
sequence completion in these regions.) The fraction of genes with 



Table 2, Gene density. Autosomes are divided into the genetically defined 
compartments of the left arm (L), the central cluster region (C), and the right 
arm (R). The percentage of genes with EST and database matches was 
determined only from manually inspected genes, Database matches to non- 
nematode proteins were determined with WUBLASTP (P < 0.001). Parenthe- 
ses denote the number of low-scoring predictions thought to be pseudogenes. 
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Fig. 2. Locations by chromosome (shown by roman numerals) of local 
gene clusters. The x axis represents the physical distance in kilobases 
along the chromosomes. They axis represents the size of the clusters. For 
example, the chitinase cluster on chromosome li contains 17 chitinase- 
like genes. Local gene clusters were determined by searching for all cases 
of N genes that are similar within a window of 2N genes along the 
chromosomes (for example, three similar genes within a window of six 
were considered a cluster; clusters were extended until no similar genes 
could be added). Clusters of N = 3 or more were plotted. The criterion 
for similarity was defined as a BLASTP score of at least 200. ATP, 
adenosine 5'-triphosphate; TM, transmembrane; Mem. Recep., mem- 
brane receptor; SCP/TPX, a family of proteins (SCP, sperm-coating gly- 
coprotein; TPX, Tpx-1, a testis-specific protein). 
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similarities to organisms other than nematodes tends to be lower on 
the arms, as does the fraction of genes with EST matches. The 
difference between autosome arms and central regions is even 
more obvious in the number of EST matches (46), The local gene 
clusters described above also appear to be more abundant on the 
arms. 

These features, together with the fact that meiotic recombina- 
tion is much higher on the autosome arms, suggested that the DNA 
on the arms might be evolving more rapidly than in the central 
regions of the autosomes. If so, one might expect that the con- 
served set of eukaryotic genes shared by yeast and C elegans 
would be largely excluded from the arms. To test this, we identi- 
fied 1517 proteins in C. elegans that are highly similar to yeast 
genes and plotted their location along the length of the chromo- 
somes (Fig. 3). For four of the five autosomes, the differences in 
the distribution of core genes are quite striking, with surprisingly 




Sequence: r~i Yeast similarities: 

Predicted genes: mm Inverted repeats: 

EST matches: Tandem repeats: ■ 

TTAGGC repeats: 

Fig. 3. Distributions of predicted genes; EST matches; yeast protein 
similarities; and inverted, tandem, and TTAGGC repeats along each 
chromosome. Gene density varies little along and among the autosomes. 
On the X chromosome, genes appear at a lower density and are more 
evenly distributed. In contrast the frequency of EST matches varies 
according to their position along the autosomes, indicating a clustering 
of highly expressed genes. The chromosomal locations of these clusters 
correlate well with the chromosomal locations of gene products that 
exhibit significant similarities to yeast proteins (P value of 10 -9 ). For the 
autosomes, repeat density varies dramatically with chromosomal posi- 
tion and is highest on the arms. The density of inverted and tandem 
repeats on the X chromosome is more uniform, but similar to the 
autosomes, TTAGGC repeats tend to be located on the arms. Supple- 
mental information regarding the analysis can be found at www. 
sciencemag.org/feature/data/c-elegans.shl for a general overview. 



sharp boundaries evident. These boundaries appear close to the 
boundaries in the genetic map that separate regions of high and low 
rates of recombination (47). 

Conclusions 

There are several reasons for completely sequencing a genome. The 
first and most simple reason is that it provides a basis for the 
discovery of all the genes. Despite the power of cDNA analysis and 
its enormous value in interpreting genome sequence, it is now gen- 
erally recognized that a direct look at the genome is needed to 
complete the inventory of genes. Second, the sequence shows the 
long-range relationships between genes and provides the structural 
and control elements that must lie among them. Third, it provides a set 
of tools for future experimentation, where any sequence may be 
valuable and completeness is the key. Fourth, sequencing provides an 
index to draw in and organize all genetic information about the 
organism. Fifth, and most important over time, is that the whole is an 
archive for the future, containing all the genetic information required 
to make the organism (the greater part of which is not yet understood). 
As a resource, the sequence will be used indefinitely not only by C. 
elegans biologists, but also by other researchers for the comparison with 
and the interpretation of other genomes, including the human genome. 

As was already known, the genome of a multicellular organism is 
very different from that of a microbial organism (and even different 
from that of a eukaryote such as yeast). It is predominantly noncoding, 
with genes extended (sometimes over many kilobases) by introns. 
Rather than acting primarily as the source for a set of protein 
sequences, the genomic sequence itself remains the primary focus of 
annotation. There are two reasons for this. First, much information 
about biological function is located in noncoding sequences; second, 
current methods of gene identification, both experimental and com- 
putational, are not yet accurate and complete enough to provide a 
definitive set of protein sequences. 

If we began again now, would we employ the same approach? 
Almost certainly (48). The clone-based physical map was a critical 
factor in organizing the project between the two sites. The clones of 
the map have also been valuable reagents for the research community 
and continue to be so; the discrete assemblies of cosmids and YACs 
have been essential to disentangling extensive repeats in many areas. 
For the numerous small areas that are underrepresented in shotgun 
assemblies, rare subclones can be readily recovered from the cosmid 
and YAC subclone libraries. 

There are two minor changes that we would make in the sequenc- 
ing approach. We would add longer insert bacterial clones (for 
example, bacterial artificial chromosomes) to the map, fingerprinting 
them in the same manner as cosmids (48). Second, we would begin 
YAC sequencing earlier in the project. That we did not do so on this 
occasion was for historical reasons [in particular, the availability of 
the yeast genome sequence (see above)]. 

How important has the worm project been to the Human Genome 
Project? Through feedback from many sources, we gather that it has 
been influential in showing what can be done. Certainly, it is remark- 
able to look back to 1992, when a paper concerning just three cosmids 
was published as an important milestone (10). Undoubtedly, the worm 
project has contributed to technology and software development; it is 
not a unique test-bed, but along with the other genome projects, it has 
explored ways of increasing scale and efficiency. 

Where is the finish line? This publication marks more of a 
beginning than an end and is another milestone in an ongoing process 
of the analysis of C. elegans biology. It is not very meaningful at any 
particular point to call genomes of this size finished, because of the 
inevitable imperfections that will only gradually be resolved. This is 
true no matter what method of sequencing is adopted. The important 
thing is not a declaration of completion, but rather the provision of the 
best possible tools to the users at every stage and a commitment to 
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maintenance and improvement, through interaction with the user 
community, as long as that is needed. 
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Zinc Fingers in Caenorhabditis elegans: 
Finding Families and Probing Pathways 

Neil D. Clarke and Jeremy M. Berg 



REVIEW 

More than 3 percent of the protein sequences inferred from 
the Caenorhabditis elegans genome contain sequence motifs 
characteristic of zinc-binding structural domains, and of these 
more than half are believed to be sequence-specific DNA- 
binding proteins. The distribution of these zinc-binding do- 
mains among the genomes of various organisms offers in- 
sights into the role of zinc-binding proteins in evolution. In 
addition, the complete genome sequence of C. elegans pro- 
vides an opportunity to analyze, and perhaps predict, path- 
ways of transcriptional regulation. 

Less than 1 5 years ago, it was suggested that repeated sequences 
found in transcription factor IIIA (TFIIIA) of Xenopus might fold 
into structural domains stabilized by the binding of zinc to con- 
served cysteine and histidine residues (1-3). Klug and co-workers 
further noted that "it would not be surprising if the same 30 residue 
units were found to occur in varying numbers in other related gene 
control proteins" (7). This proposal proved remarkably prescient: 
Caenorhabditis elegans, for example, turns out to have more than 
100 such proteins, and the number of domains per protein varies 
from one to perhaps as many as fourteen. Unanticipated at the time, 
though, was the fact that the zinc-binding motif found in TFIIIA is 
just one of many small zinc-binding domains, a number of which 
are involved in gene regulation. The properties of a few of these 
domains have been summarized recently (4). 

Eukaryotes contain a much greater number of proteins with 
well-characterized zinc-binding motifs than do bacterial and ar- 
chaeal organisms (Table 1). The complete genome of Caenorhab- 
ditis elegans (a metazoan), in conjunction with that of Saccharo- 
myces cerevisiae (a yeast), presents a special opportunity to ex- 
amine the range and diversity of these gene families in eukaryotes. 
Furthermore, because some of these zinc-binding motifs are se- 
quence-specific DNA-binding proteins, the availability of nearly 
complete sequence information also permits a preliminary analysis 
of the distribution of potential binding sites within the entire 
genome. Such analyses may prove to be of value in deducing 
development control pathways and in more fully defining the 
characteristics of eukaryotic promoters. 
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The Cys 2 His 2 Family 

The zinc-stabilized domains of TFIIIA are known as "zinc fingers" or 
Cys 2 His 2 domains. The consensus sequence for this family is (Phe, 
Tyr)-X-Cys-X 2 _ 4 -Cys-X 3 -Phe-X 5 -Leu-X 2 -His-X 3 _ 5 -His (5-7). In both 
C. elegans and the yeast S. cerevisiae, roughly 0.7% of all proteins 
contain one or more Cys 2 His 2 zinc finger domains (Table 1). How- 
ever, the distribution of these domains within proteins is rather 
different in the two organisms. In yeast, the majority of zinc finger 
proteins contain exactly two domains, and only a few (—10%) have 
more than two. In contrast, there are more zinc fmger proteins in C 
elegans that have three or more Cys 2 His 2 domains than there are 
proteins that have exactly two (Fig. 1) (8). On the basis of the 
sequences of mammalian and Drosophila zinc finger proteins, it 
appears that the distribution of Cys 2 His 2 domains among C elegans 
proteins is typical of multicellular organisms. 

The GATA, LIM ( and Hormone Receptor Families: 
Implications for Metazoan Evolution 

The GATA domain, the LIM domain, and the DNA-binding domains 
from nuclear hormone receptors each include a four-cysteine zinc- 
binding domain that can be clustered into the same structural super- 
family, and it is possible that they share a common evolutionary origin 
(Fig. 2) (9, 10). In addition to the Cys 4 superfamily domain, LIM 
domains contain a similar LIM-specific Cys 2 HisCys zinc motif, 
whereas the hormone receptors have a second and distinct Cys 4 
domain. GATA proteins frequently contain a pair of Cys 4 superfamily 
domains. 

Normalized to the number of genes in their respective genomes, 
the number of GATA and LIM domain homologs is similar in C 
elegans and S. cerevisiae. In striking contrast, the hormone receptor 
family is completely absent in yeast but is the largest single family of 
zinc-binding domains in C elegans. In fact, with over 200 family 
members, the hormone receptors make up nearly 1.5% of the entire 
coding sequence of C. elegans. The differences in the distribution of 
nuclear hormone receptors in C. elegans and S. cerevisiae may be 
relevant to the evolution of multicellular animals. As has been noted 
before, the evolution of hormone receptors may have been a key event 
in the development of cell-cell communication and the origins of 
multicellularity in the metazoa (11). 

The ligand-binding domains of the hormone receptors have di- 
verged considerably more than the DNA-binding domains. Applying 
the same criterion for significance to both the DNA- and ligand- 
binding domains of the hormone receptor family, only about 10% of 
the open reading frames (ORFs) that have a DNA-binding domain 
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tmhmm transmembrane 21 40 
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align 
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The Genome of the Natural 
Genetic Engineer Agrobacterium 
tumefaciens C58 

Derek W. Wood, 1 Joao C. Setubal, 2 4 Rajinder Kaul, 5 
Dave E. Monks, 1 Joao P. Kitajima, 2 ' 3 Vagner K. Okura, 2 
Yang Zhou, 5 Lishan Chen, 1 * Gwendolyn E. Wood, 1 
Nalvo F. Almeida Jr., 6 Lisa Woo, 1 Yuching Chen, 1 ! 
Ian T. Paulsen, 7 Jonathan A. Eisen/ Peter D. Karp, 8 
Donald Bovee Sr., 5 Peter Chapman, 5 James Clendenning 5 
Glenda Deatherage, 5 Will Gillet, 5 Charles Grant, 5 
Tatyana Kutyavin, 5 Ruth Levy, 5 Meng-Jin Li, 5 Erin McClelland, 5 

Anthony Palmieri, 5 Christopher Raymond, 5 Gregory Rouse, 5 
Channakhone Saenphimmachak, 5 Zaining Wu, 5 Pedro Romero, 8 
David Gordon, 9 Shiping Zhang, 10 Heayun Yoo, 10 Yumin Tao, 11 
Phyllis Biddle, 10 Mark Jung, 10 William Krespan, 10 
Michael Perry, 10 Bill Gordon-Kamm, 11 Li Liao, 10 Sun Kim, 10 
Carol Hendrick, 11 Zuo-Yu Zhao, 11 Maureen Dolan, 10 
Forrest Chumley, 10 } Scott V. Tingey, 10 Jean-Francois Tomb, 10 
Milton P. Gordon, 12 Maynard V. Olson, 5 Eugene W. Nester 113 § 

The 5.67-megabase genome of the plant pathogen Agrobacterium tumefaciens 
C58 consists of a circular chromosome, a linear chromosome, and two plasmids. 
Extensive orthology and nucleotide colinearity between the genomes of A. 
tumefaciens and the plant symbiont Sinorhizobium meliloti suggest a recent 
evolutionary divergence. Their similarities include metabolic, transport, and 
regulatory systems that promote survival in the highly competitive rhizosphere; 
differences are apparent in their genome structure and virulence gene com- 
plement. Availability of the A. tumefaciens sequence will facilitate investiga- 
tions into the molecular basis of pathogenesis and the evolutionary divergence 
of pathogenic and symbiotic lifestyles. 



Agrobacterium tumefaciens is an a-pro- 
teobacterium of the family Rhizobiaceae and 
a member of the diverse Agrobacterium ge- 
nus. A ubiquitous soil organism and etiolog- 
ical agent of the plant disease crown gall (/), 
A. tumefaciens infects more than 90 families 
of dicotyledonous plants, resulting in major 
agronomic losses (2, 3). The gall results from 
the transfer, integration, and expression of a 
discrete set of genes (T-DNA) located on the 
tumor-inducing (Ti) plasmid. Expression of 
these genes leads to biosynthesis of plant 
growth hormones as well as a bacterial nutri- 
ent source called opines (4). The processing 
and transfer of the T-DNA is mediated by the 
Ti plasmid virulence (vrV) genes, and several 
virulence determinants initially characterized 
in A. tumefaciens have been found in plant 
symbionts and animal pathogens (5-7). 

The genes within the T-DNA can be re- 
placed by any DNA sequence, making A. 
tumefaciens an ideal vehicle for gene transfer 
and an essential tool for plant research and 
transgenic crop production. The research and 
commercial potential of A. tumefaciens has 
been broadened under laboratory conditions 
to include the transfer of T-DNA to recalci- 



trant plants, fungi (8), and human cells (9). 

A. tumefaciens shares a similar habitat and 
close evolutionary relationship with the nitro- 
gen-fixing symbionts of the Rhizobiaceae (JO). 
Indeed, the introduction of a symbiotic plasmid 
from Rhizobium phaseoli into A. tumefaciens 
results in the weak but measurable formation of 
nitrogen-fixing root nodules (11), suggesting a 
shared genetic background. The recent publica- 
tion of the genome sequences of two Rhizobi- 
aceae, Sinorhizobium meliloti (12) and Meso- 
rhizobium loti (13), allowed a genome-wide 
comparison with A. tumefaciens. We present 
the results of this comparison as well as a 
detailed analysis of the genome of A. tumefa- 
ciens strain C58 (14, 15). 

General features of the genome. The 
5.67-Mb genome of A. tumefaciens C58 (16) 
comprises four replicons (17): a circular 
chromosome, a linear chromosome, and the 
AtC58 and TiC58 plasmids (Table 1 and Fig. 
1). The genome contains 5419 predicted pro- 
tein-coding genes (14), of which we have 
assigned a putative function to 3475 (64.1%). 
The remaining 1944 genes (35.9%) include 
1236 conserved hypothetical genes (22.8%) 
whose predicted products are similar to pro- 



teins of unknown function in other genomes, 
and 708 hypothetical genes (13.1%) with no 
significant matches in the sequence databases 
(Table 1). Our analysis assigns the A. tume- 
faciens genes to 501 paralogous families con- 
taining from 2 to 206 members (14). The two 
largest families are composed of genes be- 
longing to the adenosine triphosphatase 
(ATPase) and membrane-spanning compo- 
nents of the ATP binding cassette (ABC) 
transport family. 

The overall GC content of the A. tumefa- 
ciens genome is 58%. The TiC58 plasmid has 
two regions of distinctive GC content: the 
T-DNA (46%) and the vir region (54%) (Fig. 
1). Low GC content was noted previously in 
the T-DNA of a related Ti plasmid (18). 
Reduced GC content (53%) is also seen with- 
in a 24-kb segment of pAtC58 (AT island, 
Fig. 1). This region includes 17 conserved 
hypothetical or hypothetical genes, an ATP- 
dependent DNA helicase, and an insertion 
sequence (IS) element. These genes are 
flanked by a phage integrase and a second IS 
element. The genes in these three regions 
have a distinct codon usage as compared to 
the rest of the genome, consistent with their 
recent evolutionary acquisition (14). 

The genome contains 53 transfer RNAs 
(tRNAs) that represent all 20 amino acids 
(Table 1). These tRNAs are distributed un- 
evenly between the circular and linear chro- 
mosomes. Transfer RNA species correspond- 
ing to the most frequently represented ala- 
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nine, glutamine, and valine codons are found 
only on the linear replicon. The genome con- 
tains 25 predicted IS elements representing 
eight different families (14). The largest is 
the IS3 family comprising 10 IS elements. 
The IS elements are not equally distributed 
among the replicons but are located preferen- 
tially on the linear chromosome and pAtC58 
(Table 1). The adjacent virHl and virH2 
genes of the Ti plasmid, encoding p450 
mono-oxygenases (19), are flanked by IS el- 
ements, which suggests that they arrived in^. 
tumefaciens as part of a compound transpo- 
son. Twelve genes of probable phage origin 
were identified, most of which are on the 
circular chromosome (Table 1). Many of 
these genes cluster in two discrete regions 
and thus may represent prophage remnants. 
None of these clustered phage-related genes 
are shared with S. meliloti, which implies that 
they were lost from S. meliloti or entered the 
A. tumefaciens genome after these organisms 
evolutionarily diverged. 

Phytogeny and whole-genome com- 
parison. A comparison with all sequenced 
organisms reveals that the A. tumefaciens 
proteome is most similar to that of two rhi- 
zobial species, S. meliloti and M. loti (14). 
This result was obtained by cataloging top 
BLAST hits of predicted A. tumefaciens pro- 
teins and by classifying predicted proteins 
into clusters of orthologous groups (Fig. 2A) 
(20). Of the two rhizobial species, the A. 
tumefaciens proteome is most similar to that 
of S. meliloti. Phylogenetic analyses of 
broadly conserved proteins indicate that this 
similarity results from A. tumefaciens and S. 
meliloti sharing a recent common ancestor, 
and not from gene loss or branch rate varia- 
tion (Fig. 2, B and C). 

Sinorhizobium meliloti has a circular 
chromosome (3.65 Mb) and two plasmids 
(1.68 Mb and 1.35 Mb), with a total genome 
size 1.1 Mb larger than that of A, tumefaciens 
(12). The circular chromosomes of these or- 

Fig. 1. (facing page) Schematic representation 
of the A. tumefaciens genome. Chromosomes 
are drawn to scale with plasmids represented at 
5X or 10X magnification, as indicated. The 
outer two bands indicate opposing transcrip- 
tional orientations of predicted genes. Colors 
indicate orthology to proteins in the S. meliloti 
replicons: blue, chromosome; green, pSymA; 
gold, pSymB; red, nonorthologous. The inner 
circle depicts CC content for each coding re- 
gion, with lower GC content indicated by dark- 
er shading. The vir and T-DNA regions of 
pTiC58 and the AT island of pAtC58 are indi- 
cated. Orthologs were identified by comparison 
of predicted proteins for each A. tumefaciens 
replicon with the genome of 5. meliloti. Two 
proteins were considered orthologous if their 
BLASTP alignment covered at least 60% of 
each protein at an expect value of less than 
or equal to 10" 5 . Proteins that did not match 
these criteria were considered nonortholo- 
gous (14). 



ganisms show extensive nucleotide colinear- 
ity and gene order conservation (Fig. 3) (14). 
Previously, such extensive colinearity has 
only been seen between members of the same 
genus. Chromosome-wide conservation of 
gene order is less pronounced between S. 
meliloti and M. loti (14). The comparison of 
the circular chromosomes of A. tumefaciens 
and S. meliloti also reveals major rearrange- 
ments near the putative replication origin and 
termini (Fig. 3, regions A and B). Similarly 
located rearrangements are commonly seen 
between closely related bacteria (21). 

A comparison of the other replicons of A. 
tumefaciens with all replicons of S. meliloti 
reveals a mosaic pattern of ortholog distribu- 
tion (Table 2 and Fig. 1). These orthologs are 
distributed across the A. tumefaciens ele- 
ments as individual genes and small regions 
of gene order conservation. Two regions of 
the linear replicon exhibit extensive conser- 
vation of gene order with a segment of the S. 
meliloti chromosome (Fig. 3, region C). The 
first comprises 46 genes (44 kb) and the 
second contains 65 genes (89 kb). These re- 
gions are partially conserved in the M. loti 
chromosome. The large number of orthologs 
and the lack of extensive gene order conser- 
vation suggest that the smaller^, tumefaciens 
replicons underwent substantial rearrange- 
ment since the organisms diverged. This find- 
ing is consistent with differential evolution- 
ary pressures acting on these elements. The 
nonorthologous genes, many of which are 
seen on the Ti plasmid, reflect lineage-spe- 
cific gene loss or acquisition from other spe- 
cies. Taken together, these data support the 
recent evolutionary divergence of A. tumefa- 
ciens and S. meliloti. 

Genus-specific genes. Comparison of 
the genomes of A. tumefaciens, S. meliloti, 
and M. loti identified genes in each organism 
that likely contribute to genus-specific biolo- 
gy (14). Of the 5419 predicted^, tumefaciens 
proteins, 853 (16%) are not found in these 



other organisms. Of these, 97 have an as- 
signed function, whereas 756 are hypotheti- 
cal or conserved hypothetical. The predicted 
products of these genes are diverse and in- 
clude proteins involved in cellulose produc- 
tion, plasmid maintenance, cell growth, tran- 
scriptional regulation, and cell wall synthesis. 
Several additional proteins are predicted to 
catabolize plant cell wall materials, sugars, 
and exudates. These include polygalacturo- 
nases, a glycosidase, an endoglucanase, a 
myo-inositol catabolism protein, and a cell 
wall lysis-associated protein. Additional 
genes, predictably found on the Ti plasmid, 
include those encoding virulence, T-DNA, 
and conjugal transfer-associated proteins. 
With 756 open reading frames (ORPs) yet to 
characterize, much remains to be elucidated 
regarding the genetic distinction between A. 
tumefaciens and its Rhizobiaceae relatives. 

Linear chromosome. Linear replicons, 
the predominant genetic element in eu- 
karyotes, have been identified in only a few 
prokaryotes. These include members of the 
genera Borrelia and Streptomyces (22, 23). 
Although sequence analysis did not reveal 
distinct features associated with terminal sec- 
ondary structures, Goodner et al found that 
the termini of this replicon are covalently 
linked (15). This covalent linkage did not 
prevent nearly complete sequencing of the 
replicon termini as confirmed by Southern 
analysis (14). Proteins associated with the 
maintenance of linear ends in other systems, 
such as telomerases or the Streptomyces tpg 
proteins (24), are absent in A. tumefaciens. 
One notable feature of the replicon termini is 
the presence of IS elements near each end. 
The evolutionary origin of this replicon 
awaits investigation, as does the mechanism 
that A. tumefaciens uses to maintain it in a 
linear form. 

There are 1882 protein-coding genes on 
the linear replicon, including those encoding 
ribosomal and DNA replication proteins, as 



Table 1. General features of the A. tumefaciens C58 genome. 



Feature 


Circular 


Linear 


pAtC58 


pTiC58 


Total 


Size (bp) 


2,841,490 


2,075,560 


542,779 


214,233 


5,674,062 


C+C content (%) 


59.4 


593 


57.3 


56.7 


58.13 


Protein-coding genes 












Assigned function 


1715 


1286 


333 


141 


3475 (64.1%) 


Conserved hypothetical 


710 


353 


128 


45 


1236 (22.8%) 


Hypothetical 


364 


243 


89 


12 


708 (13.1%) 


Total 


2789 


1882 


550 


198 


5419 


Average ORF size (bp) 


892 


988 


843 


925 


922 


Coding (%) 


87.9 


89.9 


85.4 


85.5 


88.3 


Regulators (%) 


7.7 


10.4 


11.8 


5.1 


9.0 


ABC transport 


47 


80 


20 


6 


153 


RNA 












rRNA 


2 


2 


0 


0 


4 


tRNA 


40 


13 


0 


0 


53 


tmRNA 


1 


0 


0 


0 


1 


IS elements 


2 


10 


10 


2 


24 


Phage-related 


10 


1 


1 


0 


12 
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Fig. 2. Comparisons with fully sequenced genomes. (A) Distribution of best hits based on a 
comparison of predicted proteins of A. tumefaciens with proteins from all published genomes, (B 
and C) Phylogenetic trees generated using two broadly conserved proteins. The trees were 
generated using PAUP distance methods and a distance calculation based on PAM matrices (14), 



well as 21 complete metabolic pathways. The 
presence of these genes confirms the chromo- 
somal identity of this replicon. Additional 
features, however, resemble those tradition- 
ally associated with plasmids. For example, 
genes whose products are similar to the con- 
jugative proteins TraA, MobC, and TraG are 
present, although an oriT is not apparent. 
Further, an intact and highly conserved re- 
pABC operon, the definitive element of the 
RepABC-type replicator family of circular 
plasmids, is located near the center of the 
linear chromosome. The presence of this 
operon, coupled with a colocalized GC-skew 
inversion, indicates a bidirectional plasmid- 
like mode of replication. If experimentally 
verified, this replication mechanism would 
prove unique among known linear replicons. 

Plasmid replication and transfer. Rep- 
lication of both pTiC58 and pAtC58 is me- 
diated by RepABC-type systems commonly 
found in plasmids of the Rhizobiaceae. It is 
likely that the origin of replication for these 
plasmids is adjacent to the repC gene (25). In 
contrast to the pSymB plasmid of S. meliloti 
(12), both,! tumefaciens plasmids contain all 
necessary machinery for conjugation and do 
not contain essential genes. A new conjugal 
transfer system belonging to the Type IV 
secretion family (AvhB) (26) was identified 
on pAtC58. In contrast to the tight control of 
Ti plasmid conjugal transfer mediated by spe- 
cific opines that activate quorum sensing 
(27), the conjugal transfer of pAtC58 appears 
to be constitutive. 

Transport. Transporters constitute 15% 
of the A. tumefaciens genome, 87% of which 
are found on the chromosomes (14). These 
systems are predicted to confer broad capa- 
bilities for the transport of common nutrients 
found in the rhizosphere, including sugars, 
amino acids, and peptides. In addition, there 
are 1 1 LysE/RhtB amino acid efflux proteins, 
almost double the number seen in any bacte- 
rium outside of the Rhizobiaceae (12, 28). 
These transporters may function in the export 
of homoserine lactones or other signal mole- 
cules. There are also a large number of high- 
affinity tripartite ATP-independent periplas- 
mic (TRAP) dicarboxylate transporters (29). 
Our analyses indicate that A. tumefaciens and 
the other sequenced members of the Rhizo- 
biaceae have similar transport capabilities. 

Like both & meliloti and M. loti, A. tume- 
faciens has an abundance of ABC transport- 
ers, constituting 60% of its total transporter 
complement. There are 1 53 complete systems 
plus additional "orphan" subunits. The num- 
ber of ABC transporters found in these or- 
ganisms is greater than that found in any 
sequenced eukaryote and more than double 
the number found in any sequenced bacteri- 
um (28, 30). Predicted substrates of these 
ABC transport systems include sugars (53 
systems), amino acids (29 systems), and pep- 
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tides (25 systems). Other organisms with 
large ABC transporter complements include 
photosynthetic bacteria such as Synechocystis 
PCC6803 and organisms that lack a tricar- 
boxylic acid (TCA) cycle and an electron 
transfer chain, like the mycoplasmas and 
Thermotoga maritima (28). The generation of 
large ATP pools in these organisms, via pho- 
tosynthesis or F-type ATPases, may explain 
their preference for ATP-driven transport. In 
contrast, the preference for ABC transporters 
in A. tumefaciens may reflect a need for 
high-affinity uptake systems for the acquisi- 
tion of nutrients in the highly competitive soil 
and rhizosphere environments. 

Regulation. Bacteria that inhabit diverse 
environments tend to have large complements 
of regulatory genes (31). Consistent with this, 
regulatory genes constitute a substantial propor- 
tion (9%) of the A. tumefaciens genome (Table 
1) (14). This regulatory capacity likely facili- 
tates survival of A. tumefaciens within the dy- 
namic soil and rhizosphere environments. The 
genome encodes 1 1 extracellular sigma factors, 
proteins implicated in stress responses in other 
organisms (32). In addition, although several 
LuxR family motifs are evident, only one pre- 
viously identified acyl-homoserine lactone syn- 
thase (tral) known to be involved in quorum 
sensing was detected (4). Several proteins are 
similar to eukaryotic signal transduction pro- 
teins rarely found in bacteria, including four 
regucalcin-like calcium-binding regulators and 
a serine-threonine kinase. As is true of other 
a-proteobacteria, no rpoS gene was identified. 
However, A. tumefaciens does have a homolog 
of the HF-1 protein known to regulate station- 
ary phase and oxidative stress responses in 
Escherichia coli and Brucella abortus (33). 

Our analysis identified numerous nucleo- 
tide cyclases in the plant symbionts S. me- 
liloti and M. loti (25 and 12, respectively) and 
in the evolutionarily distinct human pathogen 
Mycobacterium tuberculosis (12). These cy- 
clases are rarely found in other bacterial ge- 
nomes. The nucleotide cyclases in S. meliloti 
have been noted previously and were postu- 
lated to function in signal transduction (12). 
Contrary to our expectation, there are only 
three nucleotide cyclases in A. tumefaciens. It 
is unclear why the nitrogen-fixing plant sym- 
bionts share similarly large numbers of nu- 
cleotide cyclases with a human pathogen, 
whereas few such genes are found in the 
evolutionarily related A. tumefaciens. 

Attachment, cell surface, and secre- 
tion. The initial interaction of Agrobacte- 
Hum with its plant hosts is mediated by sev- 
eral attachment-related genes (34). These in- 
clude the chvA, chvB, exoC, and cellulose 
synthesis genes as well as the pAtC58-local- 
ized att region. Several additional genes en- 
code proteins similar to adhesins in mamma- 
lian pathogens, including BfrA of E. coli and 
PsaA of Streptococcus pneumoniae. 



Pili are extracellular appendages often re- 
quired for bacterial association with their 
hosts. Although only the pilus encoded by the 
virB operon has been experimentally con- 
firmed (55), the trb operon required for Ti 
plasmid conjugation likely produces a pilus 
(36). The avhB and ctp clusters, identified by 
our analyses, may also produce pili. Addi- 
tional surface components include exopo- 
lysaccharides, lipopolysaccharides, and cap- 
sular polysaccharides, whose biosynthetic 
genes are primarily located on the linear 
chromosome. Such surface polysaccharides 
are commonly involved in invasion, growth, 
and survival of plant-associated bacteria. 

Five protein secretion systems are found 
among Gram-negative bacteria (57), at least 
three of which are represented in A. tumefa- 
ciens. These include four potential type I 
secretion systems. Although components of 
the main terminal branch of type II secretion 
appear to be absent, the Sec system for pro- 
tein secretion across the inner membrane is 
intact. There is, however, a type IV pilus 
biogenesis system with components similar 
to those of type II secretion systems. Similar 
to S. meliloti (12), no type III secretion sys- 
tem was identified. Agrobacterium tumefa- 
ciens encodes three type IV secretion sys- 
tems: VirB, Trb (27), and AvhB. The genome 
also contains the twin arginine targeting sys- 
tem, Tat/Mtt (38). 

Virulence. To date, most virulence deter- 
minants of A. tumefaciens have been found 
on the Ti plasmid. Other than the virB oper- 



on, these genes are not found in S. meliloti. 
The TiC58 plasmid contains a single T-DNA 
region, in contrast to the two found in a 
number of other strains (18), and the 25- 
base pair (bp) border regions that delineate 
the T-DNA are not present elsewhere in the 
genome. 

The availability of the genome sequence 
has enabled the identification of genes whose 
products are similar to plant pathogen viru- 
lence proteins required for host cell wall deg- 
radation. These include pectinase (kdgF), lig- 
ninase (HgE), and xylanase as well as regu- 
lators of pectinase and cellulase production 
(pecS/M); A. tumefaciens may use such en- 
zymes to breach the cell wall of its host 
before T-DNA transfer. 

In addition, we have identified numerous 
orthologs of animal virulence genes. Exam- 
ples include those involved in host survival, 
such as the bacA locus of Brucella (39) and 
two members of the widely conserved HtrA 
family of serine proteases implicated in re- 
sponse to oxidative stress in Salmonella and 
Yersinia (40). Interestingly, a bacA homolog 
is involved in S. meliloti symbiosis (41). In- 
vasion-related homologs include the ialA and 
ialB genes of Bartonella henselae (42) as 
well as five hemolysin-like proteins with as- 
sociated type I secretion systems. The highly 
conserved mviN gene, implicated in Salmo- 
nella virulence (43), is also present. 

Metabolism. Agrobacterium tumefa- 
ciens grows on minimal medium and there- 
fore possesses all pathways required for pro- 




0.5 1.0 IS 2.0 2.5 

A. tumefaciens circular chromosome (bp x 10*) 



Fig. 3. Alignment of the proteomes of the S. meliloti chromosome and the A. tumefaciens circular 
chromosome. Each point in the figure is a bidirectional best hit. These hits were obtained by 
pairwise BLASTP searches of predicted A. tumefaciens proteins against those of S. meliloti with a 
maximum expect value of 10~ 4 (74). Putative origins (region A) and termini (region B) of 
replication are indicated, as well as a sizable region lacking colinearity (region C). 
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totrophic growth, an observation confirmed 
by our computational pathway analysis (14). 
These metabolic pathways are dispersed 
among the four replicons. Unlike their orga- 
nization in E. coli, most genes of these path- 
ways are not tightly clustered, which suggests 
that they are not present in operons. We 
identified pathways for the synthesis of all 20 
amino acids as well as numerous enzyme 
cofactors. At least one nonribosomal protein 
synthesis system for the production of 
polyketides was identified. Encoded energy 
metabolism pathways include glycolysis, 
TCA cycle, and Entner-Doudoroff. Agrobac- 
terium iumefaciens can catabolize 17 amino 
acids, including S=adenosylhomocysteine and 
4-hydroxyproline. Pathways for use or deg- 
radation of plant metabolites typically found 
in the rhizosphere were also detected. These 
include sugars such as glucose, fructose, su- 
crose, ribose, xylose, xylulose, and lactose as 
well as compounds such as myo-inositol, hy- 
dantoin, urea, and glycerol. The capacity to 
metabolize glucuronate, galactonate, galact- 
arate, gluconate, ribitol, glycogen, quinate, 
L-idonate, creatinine, stachydrine, ribosylni- 
cotinamide, and 4-hydroxymandelate was 
also detected. Chemotaxis systems respond- 
ing to many of these compounds are present 
in A. iumefaciens (44). 

Agrobacterium iumefaciens encodes a va- 
riety of proteins that may protect against 
toxic compounds in the environment. Exam- 
ples include four cytochrome p450s, two of 
which have been previously identified. One 
of these has been shown to modify ferrulic 
acid, an inducer of the vir genes (45). These 
highly oxidative enzymes may also detoxify 
or modify plant-derived compounds, includ- 
ing phytoalexins (46) and protocatechuate, 
and xenobiotics such as 1,2-dichloroethane, 
cyanate, 1,4-dichlorobenzene, and octane. In 
addition, antibiotic resistance genes targeted 
against tetracycline (47) and chlorampheni- 
col are present. 

Many components of nitrogen metabolism 
are conserved between A. iumefaciens and 
the nitrogen-fixing symbionts S. meliloti and 
M. loti. Examples include components of the 
nitrogen regulation (Ntr) system such as 
ntrBQ ntrXY, ntrA, glnE, glnD, glnB, and 



glnK. Agrobacterium iumefaciens harbors 
seven glutamine synthetase (GS) genes, 
which encode GS types I, II, and III. The 
presence of multiple GS genes may relate to 
the observation that A. iumefaciens requires 
high concentrations of glutamate for optimal 
growth. Other members of the Rhizobiaceae 
also contain multiple GS genes. In addition, 
A. tumefaciens has a gene predicted to encode 
the large hexameric adenosine monophos- 
phate (AMP)- dependent glutamate dehydro- 
genase (48), but a gene encoding the AMP- 
independent glutamate dehydrogenase was 
not identified. In contrast, S, meliloti and M. 
loti contain both genes. The A. tumefaciens 
linear chromosome carries denitrification 
genes, including a periplasmic dissimilatory 
nitrate reductase (nap), nitrite reductase (wr), 
and nitric oxide reductase (nor), but lacks 
nitrous oxide reductase (nos). In contrast, all 
of these genes are present in S. meliloti. 
Nitrate transport genes are also located on the 
linear chromosome. Although A. tumefaciens 
is considered an aerobe, the existence of these 
genes implies that it could use nitrate as an 
electron acceptor under anaerobic conditions. 
As expected, A, tumefaciens lacks the sub- 
units of nitrogenase and its cofactors. Most of 
the nod genes are also absent, except for three 
genes similar to those involved in nod factor 
production, nodL, nodX, and nodN. 

Conclusions. The combination of a lin- 
ear and a circular chromosome is found in 
only a few members of the genus Agrobac- 
terium (49). This observation represents a 
key evolutionary distinction between A. tu- 
mefaciens and S. meliloti. On the basis of 165 
ribosomal DNA phylogenetic analyses, it has 
been proposed that the genus Agrobacterium 
be reclassified into the genus Rhizobium (10). 
Combining what has been elucidated regard- 
ing genome structure with the complete ge- 
nome sequence should allow a more accurate 
definition of the taxonomic position of A. 
tumefaciens in the Rhizobiaceae. 

One striking finding from our analysis is 
the extensive similarity of the circular chro- 
mosomes of A. tumefaciens and the plant 
symbiont S. meliloti, which supports the view 
that these bacteria originated from a recent 
common ancestor. Galibert et al. speculate 



Table 2. Number of orthologous genes of A. tumefaciens with respect to S. meliloti. The number of 
orthologous genes is shown in bold, with the percentage of each A. tumefaciens replicon they represent 
shown in square brackets. The remainder of the genes, which are not orthologs, are shown in the last row. 
Numbers of putative protein coding genes for each replicon are shown in parentheses. 

A. tumefaciens 





Circular 


Linear 


pAtC58 


pTiC58 




(2789) 


(1882) 


(550) 


(198) 


? Chromosome (3341) 

— SSSIS8 


1867 [67%] 


673 


36%] 


114 [21%] 


30 


15%] 


104 [4%] 


218 


12%] 


118 [21%] 


34 


17%] 


221 [8%] 


478 


25%] 


108 [20%] 


23 


12%] 


Nonorthologous 


597 [21%] 


513 


27%] 


210 [38%] 


111 


56%] 



that the S. meliloti chromosome was present 
in a progenitor that later acquired pSymA and 
pSymB (12). The mosaic structure of the A. 
tumefaciens linear chromosome and plas- 
mids, predominantly composed of orthologs 
found on each of the S. meliloti replicons, 
suggests that these organisms diverged after 
acquisition of the pSymA and pSymB ances- 
tral molecules by this progenitor. 

Recent models of bacterial evolution sug- 
gest that the differential acquisition and loss of 
genes in organisms that inhabit the same envi- 
ronment allows divergence into symbiotic and 
pathogenic lifestyles (50). The acquisition of 
such elements is apparent in both A, tumefa- 
ciens and S. meliloti. The nod genes of S. 
meliloti (12, 57), as well as the vir genes and 
T-DNA of A. tumefaciens, display GC content 
and codon usage distinct from the rest of the 
genome, which suggests recent evolutionary 
acquisition. In the case of the T-DNA, reduced 
GC content may facilitate expression in the 
plant host, where lower GC content is common. 
Moreover, none of the T-DNA and few of the 
vir genes of A. tumefaciens have orthologs in S. 
meliloti, and most nod genes are not found in A. 
tumefaciens. Differential selection and mainte- 
nance of such horizontally acquired genes like- 
ly led to the divergence into pathogenic and 
symbiotic states. Thus, these organisms provide 
a rich model system for further investigations 
into the evolutionary divergence of pathogens 
and symbionts. 

As the central biological tool in the genera- 
tion of transgenic plants for research and agri- 
culture, A. tumefaciens, and the availability of 
its genome sequence, will continue to have an 
impact on plant biotechnology. Detailed stud- 
ies, supplemented by this sequence, should lead 
to a directed refinement of plant transformation 
that increases both the host range and transfor- 
mation efficiency of this versatile genetic tool. 
Genes likely to be targeted by such work in- 
clude potential virulence factors that are shared 
between plant and animal pathogens. Examina- 
tion of these genes in the genetically tractable 
Agrobacterium system may also serve to eluci- 
date the molecular role they play in animal 
pathogens. It is our hope that this work will 
broaden the scientific foundation from which to 
address the worldwide debate over the produc- 
tion, use, and safety of genetically modified 
organisms. 
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corrections 

Emergence of symbiosis in 
peptide self-replication 
through a hypercyclic network 

David H. Lee, Kay Severin, Yohei Yokobayashi 
& M. Reza Ghadiri 

Nature 390, 591-594 ( 1997) 

Hypercycles are based on second-order (or higher) autocatalysis 
and denned by two or more replicators that are connected by 



another superimposed autocatalytic cycle. Our study describes a 
mutualistic relationship between two replicators, each catalysing the 
formation of the other, that are linked by a superimposed catalytic 
cycle. Although the kinetic data suggest the intermediary of higher- 
order species in the autocatalytic processes, the present system 
should not be referred to as an example of a minimal hypercycle 
in the absence of direct experimental evidence for the autocataly tic 
cross-coupling between replicators. □ 



The complete genome 
sequence of the 
hyperthermophilic, 
sulphate-reducing archaeon 
Archaeoglobus fulgidus 

Hans-Peter Klenk, Rebecca A. Clayton, Jean-Francois Tomb, 
Owen White, Karen E. Nelson, Karen A. Ketchum, 
Robert J. Dodson, Michelle Gwinn, Erin K. Hickey, 
Jeremy D. Peterson, Delwood L. Richardson, 
Anthony R. Kerlavage, David E. Graham, Nikos C. Kyrpides, 
Robert D. Fleischmann, John Quackenbush, Norman H. Lee, 
Granger G. Sutton, Steven Gill, Ewen F. Kirkness, 
Brian A. Dougherty, Keith McKenney, Mark D. Adams, 
Brendan Loftus, Scott Peterson, Claudia I. Reich, 
Leslie K. McNeil, Jonathan H. Badger, Anna Glodek, 
Lixin Zhou, Ross Overbeek, Jeannine D. Gocayne, 
Janice F. Weidman, Lisa McDonald, Teresa Utterback, 
Matthew D. Cotton, Tracy Spriggs, Patricia Artiach, 
Brian P. Kaine, Sean M. Sykes, Paul W. Sadow, 
Kurt P. D'Andrea, Cheryl Bowman, Claire Fujii, 
Stacey A. Garland, Tanya M. Mason, Gary J. Olsen, 
Claire M. Fraser, Hamilton O. Smith, Carl R. Woese 
& J. Craig Venter 

Nature 390, 364-370 (1997) 

The pathway for sulphate reduction is incorrect as published: in 
Fig. 3 on page 367, adenylyl sulphate 3 -phosphotransferase (cysC) is 
not needed in the pathway as outlined, as adenylyl sulphate 
reductase (aprAB) catalyses the first step in the reduction of adenylyl 
sulphate. The correct sequence of reactions is: sulphate is first 
activated to adenylyl sulphate, then reduced to sulphite and subse- 
quently to sulphide. The enzymes catalysing these reactions are: 
sulphate adenylyltransferase (sat), adenylylsulphate reductase 
{aprAB), and sulphite reductase (dsrABD). We thank Jens-Dirk 
Schwenn for bringing this error to our attention. □ 
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Archaeoglobus fulgidus Is the first sulphur-metabolizing organism to have its genome sequence determined. Its 
genome of 2,178,400 base pairs contains 2,436 open reading frames (ORFs). The Information processing systems and 
the biosynthetic pathways for essential components (nucleotides, amino acids and cofactors) have extensive 
correlation with their counterparts In the archaeon Methanococcus jannaschiL The genomes of these two Archaea 
indicate dramatic differences in the way these organisms sense their environment, perform regulatory and transport 
functions, and gain energy. In contrast to M. jannaschiiy A. fulgidus has fewer restriction -modification systems, and 
none of its genes appears to contain inteins. A quarter (651 ORFs) of the A. fulgidus genome encodes functionally 
uncharacterized yet conserved proteins, two-thirds of which are shared with M.jannaschii (428 ORFs). Another 
quarter of the genome encodes new proteins Indicating substantial archaeal gene diversity. 



Biological sulphate reduction is part of the global sulphur cycle, 
ubiquitous in the earths anaerobic environments, and is essential to 
the basal workings of the biosphere. Growth by sulphate reduction 
is restricted to relatively few groups of prokaryotes; all but one of 
these are Bubacteria, the exception being the archaeal sulphate 
reducers in the Archaeoglobales 1,2 . These organisms are unique in 
that they are unrelated to other sulphate reducers, and because they 
grow at extremely high temperatures 3 . The known Archaeoglobales 
are strict anaerobes, most of which are hyperthermophilic marine 
sulphate reducers found in hydrothermal environments 2,4 and in 
subsurface oil fields 5 . High- temperature sulphate reduction by 
Archaeoglobus species contributes to deep subsurface oil-well 'sour- 
ing' by producing iron sulphide, which causes corrosion of iron and 
steel in oil- and gas-processing systems 5 . 

Archaeoglobus fulgidus VC-16 (refs 2, 4) is the type strain of the 
Archaeoglobales. Cells are irregular spheres with a glycoprotein 
envelope and monopolar flagella. Growth occurs between 60 and 
95 °C, with optimum growth at 83 °C and a minimum division time 
of 4h. The organism grows organoheterotrophically using a variety 
of carbon and energy sources, but can grow lithoautotrophically on 
hydrogen, thiosulphate and carbon dioxide 6 . We sequenced the 
genome of A. fulgidus strain VC-16 as an example of a sulphur- 
metabolizing organism and to gain further insight into the Archaea 7,8 
through genomic comparison with Methanococcus jannaschu^. 

General features of the genome 

The genome of A. fulgidus consists of a single, circular chromosome 
of 2,178,400 base pairs (bp) with an average of 48.5% G+C content 



(Fig. 1). There are three regions with low G+C content (<39%), two 
rich in genes encoding enzymes for lipopolysaccharide (LPS) 
biosynthesis, and two regions of high G+C content (>53%), 
containing genes for large ribosomal RNAs, proteins involved in 
haem biosynthesis (hemAB) y and several transporters (Table 1). 
Because the origins of replication in Archaea are not characterized, 
we arbitrarily designated base pair one within a presumed non- 
coding region upstream of one of three areas containing multiple 
short repeat elements. 

Open reading frames. Two independent coding analysis programs 
and BLASTX 10 searches (see Methods) predicted 2,436 ORFs (Figs 1, 
2, Tables 1, 2) covering 92.2% of the genome. The average size of the 
A. fulgidus ORFs is 822 bp, similar to that of M. jannaschii (856 bp), 
but smaller than that in the completely sequenced eubacterial 
genomes (949 bp). All ORFs were searched against a non-redundant 
protein database, resulting in 1,797 putative identifications that 
were assigned biological roles within a classification system adapted 
from ref. 11. Predicted start codons are 76% ATG, 22% GTG and 2% 
TTG. Unlike M. jannaschiiy where 18 inteins were found in coding 
regions, no inteins were identified in A. fulgidus. Compared with M. 
jannaschiiy A. fulgidus contains a large number of gene duplications, 
contributing to its larger genome size. The average protein relative 
molecular mass (M r ) in A. fulgidus is 29,753, ranging from 1,939 
to 266,571, similar to that observed in other prokaryotes. The 
isoelectric point (pi) of predicted proteins among sequenced 
prokaryotes exhibits a bimodal distribution with peaks at pis of 
approximately 5.5 and 10.5. The exceptions to this are Mycoplasma 
genitalium in which the distribution is skewed towards high pi 
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Figure 1 Circular representation of the 
A. fuigidus genome. The outer circle 
shows predicted protein-coding regions 
on the plus strand classified by function 
according to the colour code in Fig, 2 
(except for unknowns and hypotheticals, 
which are in black). Second circle shows 
predicted protein-coding regions oh the 
minus strand. Third and fourth cjrcles 
show IS elements (red) and other repeats 
(green) on the plus and minus strand. 
Fifth and sixth circles show tRNAs (blue), 
rRNAs (red) and.sRNAs (green) on the 
plus and minus strand, respectively. 



Table 1 Genome features 



General '^^^Xl 
Chromosome size: 

Protein coding regions: . 


2,178,400 bp 
92.2% 




Predicted protein coding sequences: 

Identified by database match: 
putative functJoiijassigned: 
homologues o\ ttf. jannaschii ORFs: 
conserved hypothetical proteins: 

No database match: 

Members of 242 paralogous families: 
: Merrtbers of 158 families with known functions: 


2,436(1.1 perkb) 
1,797 
1,096 

916 

651 

639 

719 




Stable RNAs 

lOSrRNA: 
23S rRNA 
5S rRNA: 
7S RNA: 
RNase P: 

46 species of tRN A: 

tRNAs with 15-62 bp introns: 


Coordinates 
1,790,478-1,788,987 
1,788,751-1,785,820 
81,144-81,021 
798,067-798,376 
86,281-86,032 
no significant clusters 
Asp GUC , Glu uuc , Leu CAA , Trp CCA , Tyr 6 ^ 




Distinct G+C content regions 

HGC-1, >53% G+C 
HGC-2, >53% G+C 
LGC-1, <39%G+C 
LGC-2, <39%G+C 
LGC-3, <39%G+C 


Coordinates 
1,786,000-1,797,000 
2,158,000-2,159,000 
281,000-284,000 
544,000-550,000 
1,175,000-1,177,000 




Short, non-coding repeats 

SR-1A, CTTTC AATCCC ATTTTG GTCTG ATTTC AAC 
SR-1 B, CTTTC AATCCCATTTTG GTCTG ATTTC A AC 
SR-2, CTTTCAATCTCC ATTTTC AG G G C CTCC CTTTCTTA 


Coordinates 
147-4,213 
398,368-401,590 
1,690,930-1,694,104 




Long, coding repeats 

LR-01 NADH-flavin oxido reductase 
LR-02 NifS, Niftl + ORF 

LR-03 ISA1214 putative transposase + ISORF2 
LR-04 ISA1083 putative transposase + ISORF2 
LR-05 type II secretion system protein 
LR-06 ISA0963 putative transposase 
LR-07 homologue of MJ0794 
LR-08 conserved hypothetical protein 
LR-09 conserved hypothetical protein 


Length 
1,886 bp 
1,549 bp 
1,214 bp 
1,083 bp 
1,014 bp 
963 bp 
836 bp 
696 bp 
628 bp 


Copy number 
2 copies 

2 copies 

6 copies 

3 copies 

4 copies 

7 copies 
3 copies 
2 copies 
2 copies 
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(median, 9.8) and A. fulgidus where the skew is toward low pi 
(median, 6.3). 

Multigene families. In A. fulgidus 719 genes (30% of the total) 
belong to 242 families with two or more members (Table 1). Of 
these families, 157 contained genes with biological roles. Most of 
these families contain genes assigned to the Energy metabolism 1 , 
'transport and binding proteins*, and 'fatty acid and phospholipid 
metabolism' categories (Table 2). The superfamily of ATP-binding 
subunits of ABC transporters is the largest, containing 40 members. 
The importance of catabolic degradation and signal recognition 
systems is reflected by the presence of two large superfamilies: acyl- 
CoA ligases and signal-transducing histidine kinases. A. fulgidus 
does not contain a homologue of the large 16-member family found 
in M. jannaschii 9 . 

Repetitive elements. Three regions of the A, fulgidus genome 
contain short (<40 bp) direct repeats (Table 1). Two regions (SR- 
1A and SR-1B) contain 48 and 60 copies, respectively^ of an identical 
30-bp repeat interspersed with unique sequences averaging 40 bp. 
The third region (SR-2) contains 42 copies of a 37-bp repeat similar 
in sequence to the SR-1 repeat and interspersed with unique 
sequence averaging 41 bp. These repeated sequences are similar to 
the short repeated sequences found in M. jannaschii. 

Nine classes of long (>500 bp) repeated sequences with ^95% 
sequence identity were found (LR1-LR9; Table 1). LR-3 is a novel 
element with 14 -bp inverted repeats and two genes, one of which 
has weak similarity to a transposase from Halobacterium salinarium. 
One copy of LR-3 interrupts AF2090, a homologue of a large 
M. jannaschii gene encoding a protein of unknown function. LR-4 
and LR-6 encode putative transposases not identified in M. . jan- 
naschii that may represent IS elements. The remaining LR elements 
are not similar to known IS elements. 

Central intermediary and energy metabolism 

Sulphur oxide reduction may be the dominant respiratory process 
in anaerobic marine and freshwater environments, and is an 
important aspect of the sulphur cycle ih anaerobic ecosystems 12 . 
In this pathway, sulphate (SO^ )is first activated to adenylylsulphate 
(adenosine-5'-phosphosulphate; APS), then reduced to sulphite 
and subsequently to sulphide 1 ' 13 (Fig. 3). The most important 
enzyme in dissimilatoty sulphate reduction, adenylylsulphate 
reductase, reduces the activated sulphate to sulphite, releasing 
AMP. In A fulgidusy the APS reductase has a high degree of 
similarity, and identical physiological properties to APS reductases 
in sulph|te r reducing delta proteobactcria 14 . A desulphoviridin-type 
sulphite reductase then adds six electrons to sulphite to produce 
sulphide. As in the Eubacteria, three sulphite- reductase genes, 
dsrABD, constitute an operon. The genes for adenylylsulphate 
reductase and sulphate adenylyltransferase reside in a separate 
operon. In A. fulgidus y sulphate can be replaced as an electron 
acceptor by both thiosulphate (S 2 03~) and sulphite (SO^ - ), but not 
by elemental sulphur. 

A. fulgidus VC-16 has been shown to use lactate, pyruvate, 
methanol, ethanol, 1-propanol and formate as carbon and energy 
sources 2 . Glucose has been described as a carbon source 1 , but neither 
an uptake-transporter nor a catabolic pathway could be identified. 
Although it has been reported that A. fulgidus is incapable of growth 
on acetate 6 , multiple genes for acetyl-CoA synthetase (which con- 
verts acetate to acetyl-CoA) were found. The organism may degrade 
a variety of hydrocarbons and organic acids because of the presence 
of 57 p-oxidation enzymes, at least one lipase, and a minimum of 
five types of ferredoxin-dependent oxidoreductases (Fig. 3). The 
predicted p-oxidation system is similar to those in Eubacteria and 
mitochondria, and has not previously been described in the 
Archaea. Escherichia coli requires both the fadD and fadL gene 
products to import long- chain fatty acids across the cell envelope 
into the cytosol 15 . A. fulgidus has 14 acyl-CoA ligases related to 
I FadD, but as expected given that it has no outer membrane, no 



FadL. In E. co/i, FadB has several metabolic functions, but in A. 
fulgidus these functions seem to be distributed among separate 
enzymes. For example, AF0435 encodes an orthologue of enoyl- 
CoA hydratase and resembles the amino-terminal domain of FadB. 
This gene is immediately upstream of a gene encoding an ortholo- 
gue of 3-hydroxyacyl-CoA dehydrogenase that resembles the car- 
boxy-terminal domain of FadB. 

Acetyl-CoA is degraded by A. fulgidus through a C r pathway, not 
by the citric acid cycle or glyoxylate bypass 6,16,17 . This degradation is 
catalysed through the carbon monoxide dehydrogenase (GODH) 
pathway that consists of a five-subunit acetyl-CoA decarboxylase/ 
synthase complex (ACDS) and five enzymes that are typically 
involved in methanogenesis 18 . In A. fulgidusy ^however, reverse 
methanogenesis occurs, resulting in C0 2 production. All of the 
enzymes and cofactors of methanogenesis from formylmethano- 
fiiran to N 5 -methyltetrahydromethano£terin are used, but the 
absence of methyl-CoM reductase eliminates the possibility of 
methane production by conventional pathways. Production of 
trace amounts of methane (<0.1 iimolml" 1 ) 19 is probably a result 
of the reduction of N 5 -methyltetrahydromethanopterin to methane 
and tetrahydromethanopterin by carbon monoxide (CO) dehydro- 
genase. 

A. fulgidus also contains genes suggesting it has a second CO 
dehydrogenase system, homologous to that which enables 
Rhodospirillum rubrum to grow without light using CO as its sole 
energy source. Genes were detected for the nickel- containing CO \ 
dehydrogenase (CooS), an iron-sulphur redox protein, and a 
protein associated with the incorporation of nickel in CooS. 
These represent elements of a system that could catalyse the 
conversion of CO and H 2 0 to C0 2 and H 2 . 

In contrast to M. jannaschii, A. fulgidus contains genes representing 
multiple catabolic pathways. Systems include CoA-SH-dependent 
ferredoxin oxidoreductases specific for pyruvate, 2-ketoisovalerate, 
2-ketoglutarate and indolepyruvate, as well as a 2-oxoacid with little 
substrate specificity 20 ' 21 . Four genes with similarity to the tungsten - 
containing aldehyde ferredoxin oxidoreductase were also found 22 . 

Biochemical pathways characteristic of eubacterial metabolism, 
including the pentose-phosphate pathway, the Entner-Doudoroff 
pathway, glycolysis and gluconeogenesis, are either completely 
absent or only partly represented (Fig. 3). A. fulgidus does not 
have typical eubacterial polysaccharide biosynthesis machinery, yet 
it has been shown to produce a protein and carbohydrate-contain- 
ing biofilm 23 . Nitrogen is obtained by importing inorganic mole- 
cules or degrading amino acids (Fig. 3); neither a glutamate 
dehydrogenase nor a relevant fix or m/gene is present. 

The F 42 oH 2 :quinone oxidoreductase complex 24 is recognized as 



Figure 2 Linear representation of the A fulgidus genome illustrating the location ^ 
of each predicted protein-coding region, RNA gene, and repeat element in the 
genome. Symbols for the transporters are as follows: AsO, arsenite; COH, sugar; 
Pi, phosphate; aa2, dipeptide; NH4, ammonium; a/o, arginine/lysine/ornithine; s/ 
p, spemnidine/putrescine; glyc, glycerol; Cr, chloride; Fe 2+ , iron(ll); Fe 3 \ iron(ill); I, 
L, V, branched-chain amino acids; R proline; pan, pantothenate; rib, ribose; lac, 
lactate; Mg 2 7Co 2 *, magnesium and cobalt; gin, glutamine; NO 3 ", nitrate; ox/for, 
oxalate/formate; main, maionic acid; Hg 2+ , mercury; phs, polysaccharide; SO*', 
sulphate; OCN" cyanate; hex, hexuronate; phs, polysialic acid; K\ potassium 
channel; H7Na\ sodium/proton antiporter; NaVCr, sodium- and chloride- 
dependent transporter; P/G, osmoprotection protein; Cu 2+ , copper-transporting 
ATPase; +?, cation-transporting ATPase; ?, ABC-transporter without known 
function. Triplets associated with tRNAs represent the anticodon sequence. 
Numbers associated with GES represent the number of membrane-spanning 
domains (MSDs) according to Goldman, Engelman and Steiz scale as 
determined by TopPred 39 . Genes whose identification is based on genes in 
M. jannaschii are indicated by circles. Of the 236 proteins containing at least 
one MSD, 1 24 of these had two or more MSDs. 
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the main generator of proton-motive force. However, our analysis 
indicates the presence of heterodisulphide reductase and several 
molybdopterin-binding oxidoreductases, with polysulphide, nitrate, 
dimethyl sulphoxide, and thiosulphate as potential substrates, 
which might contribute to energizing the cell membrane. A. fulgidus 



contains a large number of flavoproteins, iron-sulphur proteins 
and iron-binding proteins that contribute to the general intra- 
cellular flow of electrons (Fig. 3). Detoxification enzymes include a 
peroxidase/catalase, an alkyl -hydroperoxide reductase, arsenate 
reductase, and eight NADH oxidases, presumably catalysing the 
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Figure 3 An integrated view of metabolism and solute transport in A fulgidus. 
Biochemical pathways for energy production, biosynthesis of organic compounds, 
and degradation of amino acids, aldehydes and acids are shown with the central 
components of A fulgidus metabolism, sulphate, lactate and acetyl-CoA highlighted. 
Pathways or steps for which no enzymes were identified are represented by a red 
arrow. A question mark is attached to pathways that could not be completely 
elucidated. Macromolecular biosynthesis of RNA, DNA and ether lipids have 
been omitted. Membrane-associated reactions that establish the proton-motive 
force (PMF) and generate ATP (electron transport chain and ViV 0 -ATPase) are 
linked to cytosolic pathways for energy production. The oxa late-formate 
antiporters {oxIT) may also contribute to the PMF by mediating electrogenic 
anion exchange. Each gene product with a predicted function in ion or solute 
transport is illustrated. Proteins are grouped by substrate specificity with 
transporters for cations (green), anions (red), carbohydrates/organic alcohols/ 
acids (yellow), and amino acids/peptides/amines (blue) depicted. Ion-coupled 
permeases are represented by ovals {mael exuT, panF, IctP, arsB, cynX, 
napA/nhe2, amt, feoB, trkAH, cat and putP encode transporters for malate, 
hexuronate, pantothenate, lactate, arsenite, cyanate, sodium, ammonium, iron 
(II), potassium, arginine/lysine and proline, respectively). ATP-binding cassette 
(ABC) transport systems are shown as composite figures of ovals, diamonds and 
circles {pro VWX, glnHPQ, dppABCDF, potABCD, braCDEFG, hemUV, nrtBC, cysAT, 
pstA8C,rbsAC,rfbAB correspond to gene products for proline, glutamine, dipeptide, 



rtosAC 



spermidine/putrescine, branch-chain amino acids, iron (III), nitrate, sulphate, 
phosphate, ribose and polysialic acid transport, respectively). All other porters 
drawn as rectangles {glpF, glycerol uptake facilitator; copB, copper transporting 
ATPase; corA, magnesium and cobalt transporter). Export and import of solutes is 
designated by arrows. The number of paralogous genes encoding each protein is 
indicated in brackets for cytoplasmic enzymes, or within the figure for transporters. 
Abbreviations: acs, acetyl-CoA synthetase; aor, aldehyde ferredoxin oxidoreduc- 
tase; aprAB, adenylylsulphate reductase; aspBC, aspartate aminotransferase; cdh, 
acetyl-Co A decarbonylase /synthase complex; cysC, adenylylsulphate 3-phospho- 
transferase; did, D-lactate dehydrogenase; dsrABD, sulphite reductase; eno, 
enolase; fadA/acaB, 3-ketoacyl-CoA thiolase; fadD, long-chain-fatty-acid-CoA 
ligase; fad, enoyl-CoA hydratase; fadE {acd), acyf-CoA dehydrogenase; glpA, 
glycerol-3-phosphate dehydrogenase; gipK, glycerol kinase; g/tB, glutamate 
synthase; hbd, 3-hydroxyacyl-CoA dehydrogenase; ilvE, branched-chain amino- 
acid aminotransferase; iorAB, indolepyruvate ferredoxin oxidoreductase; korABDG, 
2-ketoglutarate ferredoxin oxidoreductase;//^, L-lactate dehydrogenase; mem A , 
methylmalonyl-CoA muiase; mdhA, L-malate dehydrogenase; oadAB, oxaloacetate 
decarboxylase; orAB, 2-oxoacid ferredoxin oxidoreductase; pflD, pyruvate 
formate lysase 2; porABDG, pyruvate ferredoxin oxidoreductase; ppsA, phos- 
phoenolpyruvate synthase;prs/4 , ribose-phosphate pyrophosphokinase; sucAB, 
2-ketoglutarate dehydrogenase; sat, sulphate adenylyltransferase; TCA, tri- 
carboxylic acid cycle; vorABDG, 2-ketoisovalerate ferredoxin oxidoreductase. 
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four-electron reduction of molecular oxygen to water, with the 
concurrent regeneration of NAD. 

Transporters 

A, fulgidus may synthesize several transporters for the import of 
carbon-containing compounds, probably contributing to its ability 
to switch from autotrophic to heterotrophic growth 5 . Both M. 
jannaschii and A. fulgidus have branched- chain amino -acid ABC 
transport systems and a transporter for the uptake of arginine and 
lysine. A. fulgidus encodes proteins for dipeptide, spermidine/ 
putrescine, proline/glycine -betaine and glutamine uptake, as well 
as transporters for sugars and acids, rather like the membrane 
systems described in eubacterial heterotrophs. These compounds 
provide the necessary substrates for numerous biosynthetic and 
degradative pathways (Fig. 3). 

Many A. fulgidus redox proteins are predicted to require iron. 
Correspondingly, iron transporters have been identified for the 
import of both oxidized (Fe + ) and reduced (Fe 2+ ) forms of iron. 
There are duplications in functional and regulatory genes in both 
systems. The uptake of Fe 3+ may depend on haemin or a haemin- 
like compound because A, fulgidus has orthologues to the eubac- 
terial hem transport system proteins, HemU and HemV. A. fulgidus 
may also use the regulatory protein Fur to modulate Fe 3+ transport; 
this protein is not present in Af. jannaschii. Fe 2+ uptake occurs 
through a modified Feo system containing FeoB. This is the third 
example of an isolated feoB gene: Af. jannaschii and Helicobacter 
pylori also appear to lack/eoA, implying that FeoA is not essential 
for iron transport in these organisms. 

A complex suite of proteins regulates ionic homeostasis. Ten 
distinct transporters facilitate the flux of the physiological ions K + , 
Na + , NH4, Mg 2+ , Fe 2+ , Fe 3+ , NO3, SO]" and inorganic phosphate 
(Pj). Most of these transporters have homologues in M. jannaschii 
and are therefore likely to be critical for nutrient acquisition during 
autotrophic growth. A. fulgidus has additional ion transporters for 
the elimination of toxic compounds including copper, cyanate and 
arsenite. As in M. jannaschii, the A, fulgidus genome contains two 
paralogous operons of cobalamin biosynthesis- cobalt transporters, 
cbiMQO. -5l 

' 

Sensory functions and regulation of gene expression 

Consistent with its extensive energy-producing metabolism and 
versatile system for carbon utilization, A fulgidus has complex 
sensory., and regulatory networks. These networks contain over 55 
proteins with presumed regulatory functions, including members 
of the ArsR, AsnC and Sir2 families, as well as several iron- 
dependent repressor proteins. There are at least 15 signal- transdu- 
cing histidine kinases, but only nine response regulators; this 
difference suggests there is a high degree of cross-talk between 
kinases and regulators. Only four response regulators appear to be 
in operons with histidine kinases, including those in the methyl- 
directed chemotaxis system (Che), which lies adjacent to the 
flagellar biosynthesis operon. Although rich in regulatory proteins, 
A. fulgidus apparendy lacks regulators for response to amino -acid 
and carbon starvation as well as to DNA damage. Finally, A. fulgidus 
contains a homologue of the mammalian mitochondrial benzo- 
diazepine receptor, which functions as a sensor in signal-transduction 
pathways 25 . These receptors have been previously identified only in 
Proteobacteria and Cyanobacteria 25 , 

Replication, repair and cell division 

A. fulgidus possesses two family B DNA polymerases, both related to 
the catalytic subunit of the eukaryal delta polymerase, as previously 
observed in the Sulfolobales 26 . It also has a homologue of the 
proofreading € subunit of E. coli Pol III, not previously observed 
in the Archaea. The DNA repair system is more extensive than that 
found in Af. jannaschii, including a homologue of the eukaryal 
Rad25, a 3-methyladenine DNA glycosylase, and exodeoxynuclease 



III. As well as reverse gyrase, topoisomerase I (ref. 9), and topo- 
isomerase VI (ref. 27), the genes for the first archaeal DNA gyrase 
were identified. 

A, fulgidus lacks a recognizable type II restriction -modification 
system, but contains one type I system. In contrast, two type II and 
three type I systems were identified in Af. jannaschii. No homologue 
of the Af. jannaschii thermonuclease was identified. 

The cell-division machinery is similar to that of Af. jannaschii, 
with orthologues of eubacterial /ft and eukaryal cdc genes. However, 
several cdc genes found in Af. jannaschii, including homologues 6f 
cdc23> cdc27, cdc47 and cdc54, appear to be absent in A. fulgidus. 

Transcription and translation 

A. fulgidus and Af. jannaschii have transcriptional and translational 
systems distinct from their eubacterial land eukaryal counterparts. 
In both, the RN A polymerase contains the large universal subunits 
and five smaller subunits found in both Archaea and eukaryotes. 
Transcription initiation is a simplified version of the eukaryotic 
mechanism 28,29 . However, \A. fulgidus alone has a homologue of 
eukaryotic TBP-interacting protein 49 not seen in Af. jannaschii, but 
apparently present in Sulfolohus solfactaricus. 

Translation in A. fulgidus parallels M. jannaschii with a few 
exceptions. The organism has only one rRNA operon with an Ala- 
tRNA gene in the spacer and lacks a contiguous 5S rRNA gene. 
Genets for 46 tRNAs were identified, five of which contain introns in 
the anticodon region that are presumably removed by the intron 
excision enzyme EndA. The gene for selenocysteine tRNA (SelC) 
was not found, nor were the genes for SelA, SelB and SelD. With the 
exception of Asp-tRNA GTC and Val-tRNA CAC , tRNA genes are not 
linked in the A. fulgidus genome. The RNA component of the tRNA 
maturation enzyme RNase P is present. Both A. fulgidus and Af. 
jannaschii appear to possess an enzyme that inserts the tRNA- 
modified nucleoside archaeosine, but only A. fulgidus has the related 
enzyme that inserts the modified base queuine. 

Both A. fulgidus and Af. jannaschii lack glutamine synthetase and 
asparagine synthetase; the relevant tRNAs are presumably amino - 
acylated with glutamic and aspartic acids, respectively. An enzymatic 
in situ transamidation then converts the amino acid to its amide 
form, as seen in other Archaea and in Gram-positive Eubacteria 30 . 
Indeed, genes for the three subunits of the Glu-tRNA amidotrans- 
ferase (gatABC) have been identified in A. fulgidus. The Lys 
aminoacyl-tRNA synthetase in both organisms is a class I -type, 
not a class II -type 31 . A. fulgidus possesses a normal tRNA synthetase 
for both Cys and Ser, unlike Af. jannaschii in which the former was 
not identifiable and the latter was unusual 9 . 

Af. jannaschii has a single gene belonging to the TCP-1 chaper- 
onin family, whereas A. fulgidus has two that encode subunits a and 
p of the thermosome. Phylogenetic analysis of the archaeal TCP- 1 
family indicates that these A. fulgidus genes arose by a recent species- 
specific gene duplication, as is the case for the two subunits of the 
Thermoplasma acidophilum thermosome 32 and the Sulfolobus 
shibatae rosettasome 33 . As in Af. jannaschii, no dnaK gene was 
identified. 

Biosynthesis of essential components 

Like most autotrophic microorganisms, A. fulgidus is able to 
synthesize many essential compounds, including amino acids, 
cofactors, carriers, purines and pyrimidines. Many of these biosyn- 
thetic pathways show a high degree of conservation between A. 
fulgidus and Af. jannaschii. These two Archaea are similar in their 
biosynthetic pathways for siroheme, cobalamin, molybdopterin, 
riboflavin, thiamin and nictotinate, the role category with greatest 
conservation between these two organisms being amino-acid bio- 
synthesis. Of 78 A. fulgidus genes assigned to amino-acid bio- 
synthetic pathways, at least 73 (94%) have homologues in Af. 
jannaschii. For both archaeal species, amino-acid biosynthetic 
pathways resemble those of Bacillus subtilis more closely than 
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those of E. colt. For example, in A. fulgidus and M, jannaschiu 
tryptophan biosynthesis is accomplished by seven enzymes, TrpA, 

B, C, D, E, F, G as in B. subtilis, rather than by five enzymes, TrpA, B, 

C, D, E (including the Afunctional TrpC and TrpD) as found in E. 
coli. 

No biotin biosynthetic genes were identified, yet biotin can be 
detected in A. fulgidus cell extracts 34 , and several genes encode a 
biotin-binding consensus sequence. Similarly, A. fulgidus lacks the 
genes for pyridoxine biosynthesis although pyridoxine can be found 
in cell extracts (albeit at lower levels than seen in E. coli and several 
Archaea 34 ). No gene encoding ferrochelatase, the terminal enzyme 
in haem biosynthesis, has been identified, although A. fulgidus is 
known to use cytochromes 34 . These cofactors may be obtained by 
mechanisms that we have not recognized. Although all of the 
enzymes required for pyrimidine biosynthesis appear to be present, 
three enzymes in the purine pathway (GAR transformylase, AICAR 
formyltransferase and the ATPase subunit of AIR carboxylase) have 
not been identified, presumably because they exist as new isoforms. 

The Archaea share a unique cell membrane composed of ether 
lipids containing a glycerophosphate backbone with a 2,3 -sn 
stereochemistry 35 for which there are multiple biosynthetic 
pathways 36 . In the case of Halobacterium cutirubrum, the backbone is 
apparently obtained by enantiomeric inversion of 5«-glycerol-3- 
phosphate; in Sulfolobus acidocaldarius and Methanobacteriutn 
thermoautotrophicuniy 5rt-glycerol-l -phosphate dehydrogenase builds 
the backbone from dihydroxyacetonephosphate. An orthologue of 
sn- glycerol- 1 -phosphate dehydrogenase has been identified in 
A. fulgidusy suggesting that the latter pathway is present. 

Conclusions 

Although A. fulgidus has been studied since its discovery ten years 
ago 1 , the completed genome sequence provides a wealth of new 
information about how this unusual organism exploits its environ- 
ment. For example, its ability to reduce sulphur oxides has been well 
characterized, but genome sequence data demonstrate that A, 
fulgidus has a great diversity of electron transport systems, some 
of unknown specificity. Similarly, A. fulgidus has been characterized 
as a scavenger with numerous potential carbon sources, and its gene 
complement reveals the extent of this capability. A. fulgidus appears 
to obtain carbon from fatty acids through ^-oxidation, from 
degradation of amino acids, aldehydes and organic acids, and 
perhaps from CO. 

A. fulgidus has extensive gene duplication in comparison with 
other fully sequenced prokaryotes. For example, in the fatty acid 
and phospholipid metabolism category, there are 10 copies of 3- 
hydroxyacyl-CoA dehydrogenase, 12 copies of 3-ketoacyl-CoA 
thiolase, and 12 of acyl-CoA dehydrogenase. The duplicated pro- 
teins are not identical, and their presence suggests considerable 
metabolic differentiation, particularly with respect to the pathways 
for decomposing and recycling carbon by scavenging fatty acids. 
Other categories show similar, albeit less dramatic, gene redun- 
dancy. For example, there are six copies of acetyl- CoA synthetase 
and four aldehyde ferredoxin oxidoreductases for fermentation, as 
well as four copies of aspartate aminotransferase for amino -acid 
biosynthesis. These observations, together with the large number of 
paralogous gene families, suggest that gene duplication has been an 
important evolutionary mechanism for increasing physiological 
diversity in the Archaeoglobales. 

A comparison of two archaeal genomes is inadequate to assess the 
diversity of the entire domain. Given this caveat, it is nevertheless 
possible to draw some preliminary conclusions from the compar- 
ison of M. jannaschii and A. fulgidus. A comparison of the gene 
content of these Archaea reveals that gene conservation varies 
significantly between role categories, with genes involved in tran- 
scription, translation and replication highly conserved; approxi- 
mately 80% of the A. fulgidus genes in these categories have 
homologues in M. jannaschii. Biosynthetic pathways are also 



highly conserved, with approximately 80% of the A. fulgidus 
biosynthetic genes having homologues in M. jannaschii. In contrast, 
only 35% of the A. fulgidus central intermediary metabolism genes 
have homologues, reflecting their minimal metabolic overlap. 

Over half of the A. fulgidus ORFs (1,290) have no assigned 
biological role. Of these, 639 have no database match. The remain- 
ing 651, designated 'conserved hypothetical proteins', have sequence 
similarity to hypothetical proteins in other organisms, two -thirds 
with apparent homologues in M. jannaschii. These shared hypothe- 
tical proteins will probably add to our understanding of the genetic 
repertoire of the Archaea. Analysis of the A. fulgidus and other 
archaeal and eubacterial genomes will provide the information 
necessary to begin to define a core set of archaeal genes, as well as 
to better understand prokaryotic diversity. □ 



Methods 

Whole-genome random sequencing procedure. The type strain, A. fulgidus 
VC-16, was grown from a culture 4erived from a single cell isolated by optical 
tweezers 37 and provided by K. G. Stetter (University of Regensburg). Cloning, 
sequencing and assembly were essentially as described previously for genomes 
sequenced by TIGR* 3 *" 40 . One small- insert and one medium-insert plasmid 
library were generated by random mechanical shearing of genomic DNA. One 
large-insert lambda (K) library was generated by partial 7ip509I digestion and 
ligation to X-DASHII/£coRI vector (Stratagene). In the initial random sequen- 
cing phase, 6.7-fold sequence coverage was achieved with 27,150 sequences 
from plasmid clones (average read length 500 bases) and 1,850 sequences from 
A-clones. Both plasmid and \-sequences were jointly assembled using TIGR 
assembler 41 , resulting in 152 contigs separated by sequence gaps and five groups 
of contigs separated by physical gaps. Sequences from both ends of 560 X-clones 
served as a genome scaffold, verifying the orientation, order and integrity and 
the contigs. Sequence gaps were closed by editing the ends of sequence traces 
and/or primer walking on plasmid or X-clones clones spanning the respective 
gap. Physical gaps were closed by combinatorial polymerase chain reaction 
(PCR) followed by sequencing of the PCR product. At the end of gap closure, 90 
regions representing 0.33% of the genome had only single- sequence coverage. 
These regions were confirmed with terminator reactions to ensure a minimum 
of 2-fold sequence coverage for the whole genome. The final genome sequence 
is based on 29,642 sequences, with a 6.8-fold sequence coverage. The linkage 
between the terminal sequences of 2,101 clones from the small-insert plasmid 
library (average size 1,419 bp) and 8,726 clones from the medium-insert 
plasmid library (average size 2,954 bp) supported the genome scaffold 
formed by the X-clones (average size 16,381 bp), with 96.9% of the genome 
covered by X-clones. The reported sequence differs in 20 positions from the 
14,389 bp of DNA in a total of 11 previously published A. fulgidus genes. 
ORF prediction and gene family identification. Coding regions (ORFs) were 
identified using a combination strategy based on two programs. Initial sets of 
ORFs were derived with GeneSmith (H.O.S., unpublished), a program that 
evaluates ORF length, separation and overlap between ORFs, and with 
CRITICA (J.H.B. & G.J.O., unpublished), a coding region identification tool 
using comparative analysis. The two largely overlapping sets of ORFs were 
merged into one joint set containing all members of both initial sets. ORFs were 
searched against a non-redundant protein database using BLASTX 10 and those 
shorter than 30 codons 'coding* for proteins without a database match were 
eliminated. Frameshifts were detected and corrected where appropriate as 
described previously 40 . Remaining frameshifts are considered authentic and 
corresponding regions were annotated as 'authentic frameshifV. In total, 527 
hidden Markov models, based upon conserved protein families (PFAM version 
2.0), were searched with HMMER to determine ORF membership in families 
and superfamilies 42 . Families of paralogous genes were constructed as described 
previously 40 . TopPred 43 was used to identify membrane-spanning domains in 
proteins. 
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1fable2.Ustof A ftigWusgeneswmh putative WentHk^on. Gene numbers correspond to those in Fig. 2. Percentages 
represent per cent Identities. 



AMINO ACIO BIOSYNTHESIS 
General 

AF0906 hydantoln utilization protein A (hyuA) 

Aromatic amino add family 

AF0228 3-dehydroquinate dehydratase (aroD) 

AF1497 5-enolpyruvyl8hikjniate3i)ho9ph8te8yntha8e[aroAJ 

AFl 603 anthranilate synthase component I (trpE) 

AF1604 anthranilate synthase component II (trpD) 

AFl 602 anthranilate synthase component II (trpG) 

AF0227 chorismate mutase/prephenate dehydratase (pheA) 

AF0670 chorismate synthase (aroC) 

AF16D1 phosphoribosyl anthranilate isomeraae (trpF) 

AF2327 shikimHte6-dehydro8enase(flroE) 

AF0343 tryptophan repressor binding protein (wrbA) 

AF1599 tryptophan synthase, aubunKalpha (trpA) 

AF1240 tryptophan synthase, subunitbete(trpB-l) 

AF1600 tryptophan synthase, subunit beta (trpB-2) 

Aspartate family 

AF2112 5-methyttetrahydr opteroyttrig lutamate- 

homocyateine methyitransferase (metE) 

AF0882 asparaginase (asnA) 

AF1439 asparaglne synthetase (asrtB) 

AF2366 aspartate aminotransferase (espB-1 ) 

AF2129 aspartate aminotransferase (aspB-2) 

AFl 623 aspartate aminotransferase (aspB-3) 

AF0409 aspartate aminotransferase (aspB-4) 
sspartste aminotransferase (aspC) 
aspartate kinase (lysC) 

AF1422 aspartate racemase 

AF1506 Bspartete-semialdehydedehydrogenase(asd) 

AFOB00 diaminopimeiate decarboxylase (rysA) 

AF0747 diaminopimeiate epimerase (dapF) 

AF0909 dihydrodipicolinate reductase {dapB) 

AF0910 dihydrodipicolinate synthase(dapA) 

AF0935 romoserine dehydrogenase (horn) 

AF0886 S-fidenosylhomocysteinase hydrolase (ahcY-1 ) 

AF2000 S-adenosylhomocysteinase hydrolase (ahcY-2) 

AF0051 succinyWiaminopimelatedesucclnylase(dapE-t) 

AF09O4 succinyl-diamlnopimelate desuccinylase (dapE-2) 

AF0651 threonine synthase(thrC-l) 

AF1316 threonine synthase (thrC-2) 

Glutamate family 

AFl 280 acetyiglutamate kinase (argB) 

AF2288 ecetylglutamBte kinase, putative 

AF0080 acetytofnkhind aminotransferase (argD-1 ) 

AF1815 ace^ornithlne aminotransferase (argD-2) 

AF0522 acetylornithinedeacetylase(argE) 

AF0883 arginlnosuccinatelyase'argH} 

AF2252 argininosuccinate synthetase (argG) 

AF1 147 glutamate N-ecetyttransferase (argj) 

AF0953 glutamate synthase (gJtB) 

AF0949 glutaminesynthetase(glnA) 

AF2071 Nnacetyf-gamma-glutamyt-phosphate 

reductase (argC) 

AF1256 ornithine carbamoyttransftirBse (argF) 



368% 
41.6* 
43.7% 
43.8% 
50.0% 
32.2% 
56.3% 
37.1% 
43.1% 
46.6% 
39.5% 
39.4% 
64.1% 



AF1417 
AFO700 



Pyruvate family 

AF0957 2-isopropylmalate synthase (leuA-1) M. 

AF0219 2-isopropylmalate synthase (leuA-2) 

AF2 1 99 3-isopropylmalBte dehydratase, large subunit (leiiCj 

AF0629 3-iSopropylmalate dehydratase, small atibunrt (leuD-1 ) 

AF1761 3-iscpropylmalate dehydratase, small suburb (leub-?) 

AF0626 3-iSOpropylmalate dehydrogenase (ieuB) 

AF1720 acetolactate synthase, iargea^bunjt{ilvB-1) ' 

AF1 780 acetolactate synthase.larfle subunijt(ilvB-2) 

AF2015 acetolactate synthase, large subunit (iiv3-3) 

AF21 00 acetolactate synthase, large suoumt (i lvB-4) 

AF1719 acetolactate synthase, small subunit (ilvN) 

AF1672 acetolactat«^ynthase, smail subunit, putative 

AF0933 brancheWhain smirk) acid aminotransferase (il vE) 

AF1014 dihydro^^cidi<lfthydratase(ilvD) 

AF1985 ketoMckfireductoisomerase(ilvCy 

Serfrm family "i'm, 

AFOfiia jihosphoglycerate dehydrogenase (serA) 

AF2138 phosphosertne phosphatase (serB) 
. AT0273 sarcosine oxidase, subunitalpha (soxA) 

AF0274 sarcosine oxidase, subunit beta (soxB) 
. AP0852 serine hydroxymethyltransferase (glyA} 

Histidine family 

AFO590 ATPphosphoribosyltransferase(hisG) 
AF0212 histidinol dehydrogenase (hlsD) 
AF2002 hlstldinot-phosphate aminotransferase (hisC-1 } 
AF2024 histidinoFphosphate aminotransferase (hlsC-2) 
imidazoleglycercFphosphate 
dehydrogenase/histidinoFphosphatase (hisB) 
AF0819 imidazolegfycerol-pbosphate synthase, 

cyclase subunit (rtisF) 
AF2265 imidazoleglycerol-phosphate synthase, 

subunit H(hlsH) 
AF0509 i mida zo leg ly cerol-p hos p h ate synthase, 

subunit H, putative 
AF1950 phosphortbosyt-AMPcyclohydrolasej' 

phosphoribosyFATP pyrophosphohydrolase (hislE) 
AF0713 phosphoribosyrformimino^aminoimidazole 
carboxamida ribotide isomerase (hiaA-1) 
phospnoribosyTformimtno-5-aminoimidazole 
carboxamide ribotide isomerase (hisA-2} 



28.1% 
45.9% 
36.9% 
42,3% 
46.4% 
39.4% 
45.2% 
46.2% 
49.1% 
48.0% 
60.9% 
45.6% 
45.8% 
48.6% 
51.0% 
47.9% 
31.7% 
675% 
30.5% 
43.8% 
40.5% 
61.0% 

66.1% 
29.0% 
48.3% 
36.2% 
29.4% 
42.2% 
62.0% 
47.8% 
57.9% 
43.3% 

53.3% 
51.7% 

535% " 
: 53.9% . 
4&.3% 
56.4% 
57.1% 
59.2% 
57.5% 
32.1% 
34.1% 
38.4% 
60,4% 
29.7% 
59.0% 
54.5% 
61.6% 

48.8% 
50.7% 
31.1% 
26.5% 



AF0985 



31.6% 
51.6% 



42.2% 
67,0% 
44.4% 
43.2% 
59.6% 
37.5% 
42.2% 



272% 
30.3% 
31.9% 
30.4% 
312% 

40.8% 



Folic acid 

AF1414 dihydropteroate synthase 
Heme and porphyrin 

AF1648 bacteriochlorophyll synthase. 33 kDa subunit 27.9% 

AF0464 bacteriochlorophyll synihase, 43 kDa subunit (chlP-1) 29.7% 

AF1023 bacteriochiorophyli synthase, 43 kDa subunit (ohlP-2) 312% 

AF1637 bacteriochlorophyll synthase, 43 kDa subunit (chlP-3) 27.0% 

AFO037 cobalamin (5'-phosphate) synthase (cobS-1) 33.9% 

AF2323 cobalamin (S'-phosphflte) synthase (cobS-2) 34.4% 

AF0725 cobalamin biosynthesis precorrin methylase (cbiG) 30.7% 
AF0727 cobalamin biosynthesis precorrin-2 methyltransferase 

(cbiL) 31.5% 

AF0726 cobalamin biosynthesis preconin-3 methylase (cbiF) 49-2% 

AF0724 cobalamin biosynthesis pracorrin-a methylase (chiH) 49.0% 



AF0722 cobaiamin biosynthesis precorrin-6Y methylase (cbiE) 32.4% 
AF0732 cobalamin biosynthesis precorrin-8W 

decarboxylase (cbiT) 30.8% 

AF1336 cobalamin biosynthesis protein (cblB) 38.4% 

AF0723 cobalamin biosynthesis protein (cbiD) 36.3% 

AF0728 cobalamin biosynthesis protein (cblM-1 ) 61 .4% 

AF1843 cobalamin biosynthesis protein (cblM-2) 412% 

AF0731 cobalt transport ATP-binding protein (cbiO-1) 472% 

AF1841 cobalt transport ATP-binding protein (cbiO-2) 41.1% 

AF0729 cobalt transport protein (cbiN) 56.0% 

AF0730 cobalt transport protein (cblQ-1 ) 32.6% 

AF1842 cobalt transport protein (cbiQ-2) 30.3% 

AF1338 cobyric acid synthase (cbiPl 44.6% 

AF2229 cobyrlnic acid a.c-diamide synthase (cbiA) 42.3% 

AF1241 glutamate- 1-semialdehyde aminotransferase (hemL) 54.3% 

AF1975 glutamyMRNA reductase (hemA) 42.7% 

AF1694 heme biosynthesis protein (nirH) 252% 

AFl 126 heme biosynthesis protein (nirJ-1 1 38.7% 

AF2009 heme biosynthesis protein (nirF2) 31 Wh 

AF1593 heme d1 biosynthesis protein (nirO) 29,4% 
AF1311 oxygerHndependentcoproporphyrinogenlll 

oxidase, putative 27.1% 

AF1242 porphobilinogen deaminase (hamC( 46.3% 

AF1974 porphobilinogen synthase (hemB) 60.4% 

AF1784 protoporphyrinogen oxidase (hemKt 33.6% 

AF0422 uroporphyrirHllC-methyrtransferase(cy3G-1| 41.7% 

AF1243 uroporphyrirvlflG-fnethyrtransf erase (cysf>2> 526% 

AF01 1 6 uroporphyrinogen III synthase (hemD) 27,4% 

Menaquinone and ubiquinone 

AF2176 4-hydroxybenzoateoctaprenyltransferase(ubiA) 41.6% 

AF0404 4-hydroxybenzoateoctaprenyftransferase, putative 30.6% 

AF2413 coenzyme PQQ synthesis protein (pqqE} 30.5% 

AF1191 dihydroxyna phthoic acid synthase (menB) 64.6% 

AF1551 octeprenyl-diphosphate synthase (ispB) 332%: 

AF014O ubiquinone/menaquinone biosynthesis % ; 

methyltransferase (ubiE ) 310% 

Motybdopterin m ; 

AF2006 molybdenum cofactor biosynthesis protein (moaA) 47.8% 

AF0265 molybdenum cofactor biosynthesis protein (moeB) 44.4% 

AF2150 morybdenum cofactor biosynthesis prateiri (moaC) ; ; 92,0% 

AF0931 molybdenum cofactor biosyntr«8is prolbln {moeA--1 ( 50.8% 

AF0930 morybdenum cofactor biosynthesis pr6tein(moeA-2] 44.8% 

AF0161 molybdenum cofactor biosyntheaisprotein (moeA-3} 30.5% 

AF0531 molybdenum cofactor biosynthesisprotein(moeH) 44.0% 

AF1022 moiybdenum-pterin-binding protein (rriopB) 39.3% 

AF1624 mdybriopterin corwer^^ 36.6% 

AF2179 molybdopterinoonvertirtg factor, subunit2(moaE) 33.3% 
AF2005 molybdoptertrvgy*nirie:(ijnucfeottde biosynthesis 

protein A (mobA) ^ 33.2% 
AF2253 molybdooterin-guanhe dinuclootide biosynthesis 

protein B(mbbS) 40.0% 

Pant0rer)ete ^: ''^-M 

AF1646 pantothenate metabolism flavoprotein {dip) 42.4% 
. Riboflavin 

: AF0484 GTPcyclohydrolasell(ribA-l) 

AF2107 GTP cyclohydrolase II (ribA-2) 

; AF1 416 riboflavin synthase (ribC) 
" AF21 28 riboflavin synthase, subunit beta (ribE ) 

AF2007 riboflavtn-spedficdeaminase{ribG) 

Thiamine 
AF2075 
AF2208 
AF1S95 
AF2412 
AF0553 
AF0088 
AFO702 
AF0733 
AF2074 

Pyridine nucleotides 

AF 1 000 NFK3}-dependent NAD + synthetase (nadE) 
AF1839 nicotinate-nucleolide pyrophosphorylase (nadC} 
AF1837 quinolinate synthetase (nadA), authenticframeshift 

CELL ENVELOPE 

Membranes, lipoproteins, andporins 
AF1420 membrane protein 
AF1 354 membrane protein, putative 



hydroxyethytthiazole kinase (thiM} 
hydroxymethytpyrimidine phosphate kinase (thiD) 
thiamine biosynthesis protein (apbA) 
thiamine biosynthesis protein (thiC) 
thiamine biosynthesis protein (thiF) 
thiamine biosynthesis protein, putative 
thiamine biosynthetlc enzyme (thM ) 
thiamine monophosphate kinase (ihiL) 
thiamine phosphate pyrophosphorylase (thiE) 



44.5% 
47.1% 
53.3% 
75.9% 
43.7% 

33.6% 
35.5% 
36.9% 
60.2% 
38.1% 
28.2% 
50.0% 
30.4% 
45.5% 

52,0% 
43.2% 
53,9% 



51.8% 
32.8% 



BIOSYNTH EStS OF COFACTORS, PROSTHETIC GROUPS, AND CARRIERS 
Genera/ 

AF1B55 2,3-dlhydroxyben2oate-AMP ligase (entE) 

AF1070 coenzyme F390 synthetase {ftsA-1} 

AF1671 coenzyme F390 synthetase (ftsA-2) 

AF2013 coenzyme F390 synthetase {ftsA-3} 

AF215 1 isochorismatase (entB) 



Surface polysaccharides, lipopoiysaccharides and antigens 

AF0324 dTDP-glucose 4,6-dehydratase (rfbB ) 

AF0043 first mannosyl transferase (wbaZ-1) 

AFO606 firstmannosyf transferase (wbaZ-2) 

AF1728 galactosyltransferase 

AF0044 GDP-D-mannose dehydratase (gmd-1 ), 

authentic frameshift 
AF1142 glucose-l-phosphatecytidylyltransferase(rfbF) 
AF0242 glucose-l-phosphatethymidylyltransferase(graD-l) 
AF0325 g(ucose-1 -phosphate thymidyryltransferase (graD-2) 
AF0321 glycosyl transferase 
AF0387 aiycosytuansferase, putative 
AF0457 immunogenic protein (bcsp31-1 } 
AF0635 immunogenicprotein(bcsp31-2) 
AF0988 immunogenic protein (bcsp31-3) 
AF0602 LPS biosynthesis protein, putative 
AF061 7 LPS biosynthesis protein, putative 
AF0607 LPS glycosyltransferase, putative 
AF0326 mennose-1 -phosphate guanyryltrsnsfersse 

(rfbM), authenticframeshift 
AF1097 mannose-6-phosphate lsomerasfl/mannose-1- 

phosphate guanylyi transferase (manC) 
AF0036 mannosephosphate isomerase, putative 
AF0045 mannosyttransferase A (mtfA) 
AF031 1 O-antigen biosynthesis protein (rfbC), authentic 

frameshift 

AF0458 phosphomannomutase{pmm) 

AF0695 porysaccharide biosynthesis protein, putative 

AF0322 rhamnosyl transferase (rfbQ) 

AF0323 spore coat polysaccharide biosynthesis protein 

(spsK-2), authenticframeshift 
AF0620 succinoglycan biosynthesis protein (exoM) 
AF0361 UDP-glucose4-epimerase(galE-1) 
AF2016 UDP-glucose4-epimerase(galE-2) 
AF0302 UDP-giucose dehydrogenase (ugd-1 ) 
AF0596 UDP-glucose dehydrogenase (ugd-2} 

Surface structures 
AF1054 flagellin(flaBM) 
AF1065 flagellln(flaB1-2) 
AF0275 surface layer protein B (slgB-1 ) 
AF1413 surface tayer proiein B(slgB-2) 



50.0% 
30.0% 
29.0% 
26,9% 

40.7% 
38.6% 
27.7% 
45.2% 
30.7% 
33.8% 
34.7% 
44.3% 
28.3% 
29.6% 
29.0% 
29.7% 

42.4% 



38.7% 

30.6% 
39.6% 
24.1% 
27.5% 



24.8% 
38.6% 
30.0% 



30,0% 
31.1% 
30.8% 
29.9% 



CELLULAR PROCESSES 

General 
AH 040 
AF1036 
AF1036 
AFI037 
AF1042 
AF1034 
AF1045 
AF1041 
AF1032 
AF1044 



chemotaxis histidine kinase (chaA) 
chemotaxis histidine kinase, putative 
chemotaxis histidine kinase, putative 
chemotaxia protein methyltransferase (cheR) 
chemotaxis response regulator (cheY) 
methyl-accepting chemotaxis protein (tlpC-1 ) 
methyl-accepting chemotaxis protein (tlpC-2} 
protain-glutamate methytesterase (cheSj 
purine NTPase, putative 
purine-blndlng chemotaxis protein (cheW) 



Cell division 

AF0617 ceildivi8ioncontrolprotein2i (cdc21) 

AF1297 ceil division control protein 48, AAA family (cdc4&-1) 

AF2098 cell division control protein 48, AAA family (cdo48-2) 

AF0244 ceU division control protein 6, putative 

AF1285 cell division control protein, AAA family, putative 

AF0696 cell division inhibitor (minD-1 ) . p 

AF1937 cell drviaion inhibitor (minD-2) , : a;C ; 

AF2061 cell division protein (ftsJ) / 

AF0536 cell division protein (ftsZ-1} . 

AF0670 cell division protein (ft3Z-2| ^ 

AF0837 cell division protein pelota {pelA) 

AF1215 cell division prolBfn. pjtative 

AF0238 centromere/mlwotuiirjlfrbinding protein (cbf5} 

AF1 568 chromosoma segre^a^ protein (smd ) 

AFl 822 serine/threonine phosphatase (ppa ) 

Chaperones 

AFJ296 emafl heat shock protein (hsp20-1 } 

AF19?1 smair heat shock protein (hsp20-2) 

_ : . AF2238 , thermosome, subunit alpha (thsA) 

; AF1461 . thermosome, subunit beta (thsB) 

' : ^rarnosoYne-asscc(afeo' protein 

h AF0337 archaeal htstone A1 (hpyAM ) 

AF1493 archaealhistoneAl (hpyAV2) 

Detox/frcafc/i 

AF2173 2-nitropropanedioxygBnase(ncd2) 

AF0270 alkyt hydroperoxide reductase 

AF1361 arsenate reductase (araC) 

AF0560 N-ethyiammeline chlorohydrolese (trzA-1 ) 

AF0997 N-ethyiammeline chtorohydrolase (trzA-2) 

AF0254 NADH oxidase (noxA-1 ) 

AF0395 NADH oxidase (noxA-2) 

AF04O0 NADH oxidase (noxA-3) 

AF0951 NADH oxidase (noxA4) 

AF1858 NADH oxidase (noxA-5) 

AF0465 NADHoxidase(noxB-l) 

AF1262 NADH oxidase (noxB-2) 

AF0226 NADHoxidase(noxC) 

AF0515 NADHoxidase,putative 



41.9% 
25.3% 
30.4% 
332% 
62.9% 
27.6% 
29.6% 
43.3% 
322% 
40.4% 

82.8% . 

:-«ft1% 
62.0% 
27.6% 
493% 
56.0% 
32.8% 
40.8% 
60.4% 
61.4% 
41.7% 
32 8% 
58.8% 
32.8% 
31.9% 

52.3% 
38.1% 
70.6% 
682% 

64.6% 
69.7% 

39.7% 
73.5% 
30.6% 
45.9% 
44.5% 
35.1% 
35.5% 
40.8% 
36.7% 
34.0% 
43.3% 
42.9% 
38.4% 
25.5% 
62.9% 

50.0% 
25.0% 
54.8% 
36.6% 
512* 
36.3% 
47.0% 
34.5% 
38.5% 
38.2% 
41.7% 
46.5% 



AF2233 peroxidase / catalase (perA) 
Protein and peptide secretion 

AF1902 protein translocase, subunit SEC6 1 alpha (secY ) 
AF0636 protein translocase, subunit SEC61 gamma (secE) 
AF2062 signal recognition particle receptor (dpa) 
AF1258 Signal recognition particle, subunitSRP19(9fp19) 
AF0622 signal recognition particle, subunit SRP64 (srp54) 
AF1791 signal sequence peptidase (seel 1 ) 
AFl 667 signal sequence peptidase (spc21 ) 
AF165B signal sequence peptidase, putative 
AF0338 type II secretion system protein (gspE-1 } 
AF0659 type II secretion system protein (gspE-2) 
AF0996 type II secretion system protein (gspE-3) 
AF1049 type 'I secretion system protein (gspE-4) 
CENTRAL INTERMEDIARY METABOLISM 
Degradation of polysaccharides 

AF1207 2-deoxy-D-gluconate3-dehydrogenase(kduD) 46.3% 

AF1795 endoglucanase(ce1rv1| 55.4% 

Phosphorus compounds 

AF0756 exopolyphosphataselppxl) 55.1% 
Polyamine biosynthesis 

AF0646 agm8tinasa(spaB) 33.3% 

AF2334 spermidine synthase (speE) 37.1% 

Polysaccharides ■ (cytoplasmic) 

AF0599 dolichol phosphate mannose synthase, putative 32.1% 
Sulfur metabolism 

AF0286 8denyiylsulfate3-phosphotransferase(cysC) 52.0% 

AF1670 adenylylsurfate reductase, subunit A (aprA) 96.0% 

AF1 669 adenylylsurfate reductase, subunit B (aprB) 973% 

AF1667 sulfate adenytyttransferase (sat) 28.4% 
AF2228 sulfite reductase, desulfoviridirvtype subunit 

gamma(dsvC) *1-3% 

AF0423 sulfite reductase, subunit alpha (dsrA) 100.0% 

AF0424 surfite reductase, subunit beta (dsrB) 100.0% 

AF0425 sutfrte reductase, subunit gamma (dsrO) 97.4% 

Other 

AF 1 706 2-hydroxy-6-oxo-6-phenylhexa-2,4-dienolc acid 

hydrolase (pcbDJ 29.4% 
AF0676 2-hydroxy-6-oxoheptB-2,4-dienoaie hydrolase (todF) 26.3% 
AF0091 2-hydroxyhepta-2,4-diene-1,7-dioate isomerase 

(hpcE-1) 44.5% 
AF2225 2-hydroxyhepta-2,4^liene-1,7-dio8te isomerase 

(hpcE-2) 66.0% 

AF0333 4-hydroxyphenyiacetete-3-hydroxy!ase(hpaA-1) 22.4% 

AF0885 4+iydroxyphenylacetate-3-hydroxy1as9 (hpaA-2) 26.0% 

AF1027 4-hydroxyphenylacetate-3+iydroxylaae(hpaA-3) 21.0% 

AF0669 4-oxalocrotonatetautomerase. putative 31.9% 

AF0808 grycolate oxidase subunit (glcD| 32.0% 
AF2216 methylmatonyl-CoA decarboxylase, biotin carboxyi 

carrier subunit (mmdC) 36 2 * 
AF2217 methylmalonyFCoA decarboxylase, subunit alpha 

(mmdA) 62.6% 
AF1 288 methylmalonyl-CoA mutase, subunit alpha (mutB), 

authenticframeshift *6- 1< * 
AF221 9 methylmalonyl-CoA mutase, subunit alpha, 

C-terminusfmcmA2) 48.7% 
AF22 1 5 methylmalonyl-CaA mutase, subunit alpha, 

N-terminus(mcmAI) 51 ^% 

AF2099 muconate cycloisom erase II (elcB) 24.9% 

AFl 425 phosphonopyruvate decarboxylase (bcpC-1) 35 0% 

AF1751 phosphonopyruvate decarboxyiasa (bcpC-2) 48.8% 

ENERGY METABOLISM 

Amino acids and amines 

AF1958 2-hydroxyglutaryt-CoA dehydratase, subunitalpha 

(hgdA) 30.5% 
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AF1957 2-hydroxyglutaryf-CoA dehydratase, 

aubunit beta (hgdB) 

AF0130 acet/lporyamine aminohydrolase (aphA) 

AF2290 acetylpolyamine aminohydrolase, putative 

AF0991 glutaryHDoA dehydrogenase (gcdH) 

AF1323 group H decarboxylase 

AF2004 graupll decarboxylase 

AF2295 group II decarboxylase 

AF1665 ornithine cydodeaminase (arc6) 

Anaerobic 

AF1146 4-bydroxybutyrate CoA transferase (cat2-1 ) 
AF1864 4-+iydroxybutyrate CoA transferase (cat2-2) 
AF0866 glycerol kinase (glpK) 
AF1328 glycerophosphate dehydrogenase (glpA) 
AF0B71 glycerol-3-phosphBte dehydrogenase (NAD(P)+) 
(gpsA) 

AF0020 L-carnitine dehydratase (caiB-1 ) 
AF0990 L-carnitinedehydratase(calB-2) 

A TP-proton motive force interconversion 

AF1158 ATP synthase, subunit E, putative 

AF1166 ^-transporting ATP synthase, subunit A (atpA) 

AF1167 H+-transporting ATP synthase, subunit 6 fatpB) 

AF1164 H+-transporting ATP synthase, subunit C (atpC) 

AF1168 H+-transporting ATP synthase, subunKD (atpD) 

AF1163 HMransporting ATP synthase, subunit E (atpE) 

AF1165 H+-transponing ATP synthase, subunit F (atpF) 

AF1159 H+-Vsnsporting ATP synthase, subunit I (atpl) 

AF1160 H*-transporting ATP synthase, subunit K (atpK-1) 

AF11R2 Ht-transporting ATP synthase, subunitK (atpK-2) 

Electron transport 

AF2036 cytochrome C oxidase folding protein (coxD) 
AF0144 cytochrome C oxidase, subunit II (cbaB) 
AF0142 cytochrome C oxidase, subunit II, putative 
AF0190 cytochrome C oxidase, subunit II, putative 
AF1Q57 cytochrome C-type biogenesis protein (ccdA) 
AF2192 cytochrome C-type biogenesis protein (nrf E} 
AF2296 cytochrome oxidase, subunit I (cydA-1 ) 
AF2297 cytochrome oxidase, subunit I (cydA-2) 
AF2046 cytochrome oxidase, subunit I, putative 
AF0528 cytochrome^ hydrogenase, subunit gamma 
AF0833 desulfoferrodoxin (dfx) 
AF0344 desulfoferrodoxin, putative 
AF0287 electron transfer flavoprotein, subunit alpha (etfA) 
AF0286 electron transfer flavoprotein, subunit beta (etfB) 
AF1380 F42C^onreducing hydrogenase (vhtA) 
AF1371 F420-nonraduring hydrogenase (vhtD-1 ) 
AF1378 F420-nonreducing hydrogenase (vhtD-2) 
AF1381 F420-nonreducing hydrogenase (vhtG) 
AF1624 F420H2:quinone oxidoreductase, 11.2 kDa subunit, 
putative 

AF1823 F420H2:quinone oxidoreductase, 16.5 kDa subunit, 
putative 

AF1832 F420H2:quinone oxidoreductase, 32 kDa subunit 
(nuol) 

AFI833 F420H2:quinone oxidoreductase, 39 kDa 

subunit, putative 
AF1829 F420H2;qukione oxidoreductase, 39.7 kDa 

subunit, putative 
AF1831 F420H2;quinone oxidoreductase. 41.2 kDa subunit 

putative 

AF1827 F420H2:quinone oxidofeduaase, 432 kDa subunit,; 

putative Wi;!*, 
AF1830 F420H2:quinona oxidoreductase, 45 kDa sukuftit. 

(nuoD) Tl ; V 

AF1825 F420H2:quinone oxidoreductase, 53 9kD88ubunit 

(nuoM) ''"m... 
AF1826 F420H2:quinone oxidoreductese. 72.4 xDs 

subunit (nuoL) # 
AF0156 ferredoxin (fdx-1) \:, % x 
AF0166 ferredoxin (fdx-2) ■, ' ' " - ' ! . 
AF0355 ferredoxin (fdx-3) 
AF0427 ferredoxin (fdx-4) 
AF0923 ferredoxin (fdx-5) 
AF1010 ferrefl<win{ldx-6) 
AF1239 ferredoxin (<dx-7) 
AF2142. ferredoxin (fdx-8) 
AF0164 fe'redoxirvnstnte reductase (nlrA) 
AF2332 flavodoxin. putative 
i • • AF0167 . . flavoprotein (f prA-1 ) 
AFI520 flavoprotein (fprA-2) 
; : : AFOS57 flavoprotein reductase 

AF1463 fumerate reductase, flavoprotein subunit (fdrA) 
AF1536 glutaredoxin(grx-l) 
AF2145 glutaredoxin (grx-2) 
AF0663 heterodisulfide reductase, subunit A (hdrA-1 ) 
AF1377 heterodisulfide reductase, subunit A (hdrA-2) 
AF0662 heterodisutfidB reductase, subunit A/ 

methylviologen reducing hydrogenase, subunit delta 342% 
AF1238 heterodisulfide reductase, subunit A/methytviologen 

reducing hydrogenase, subunit delta 53.7% 
AF1376 heterodisulfide reductase, subunit B (hdrB) 36.0% 
AF0271 heterodisulfide reductase, subunit B, putativa 
AF1376 heterodisulfide reductase, subunit C (hdrC) 
AF0502 heterodisulfide reductase, subunit D, putative 
AF0809 heterodisulfide reductase, subunit D, putative 
AF0661 heterodisulfide reductase, subunit E, putative 



24.4% 
38.7% 
33.3% 
48.7% 
28.0% 
46.1% 
30.5% 
35.3% 

46.6% 
476% 
33.8% 
27,8% 

36.3% 
33.3% 
31.2% 

471% 
67.0% 
72.6% 
37.6% 
47.1% 
363% 
46.0% 
30.1% 
46.3% 
46.3% 

33.3% 
34.2% 
38.0% 
31.7% 
30.7% 
36.1% 
22,9% 
31.5% 
25.1% 
39.3% 
63.0% 
473% 
39.7% 
38.8% 
34.8% 
30.9% 
33.1% 
46.1% 

24.1% 

25.7% 

95.5% 

33.6% 

43.6% 



80.0% 

32.1% 

332% 
45.3% 
49.2% 
53.2% 



44.4% 
29.0% 
38.0% 
29.7% 
30.3% 
33.2% 
47.2% 
262% 
27.0% 
34.3% 
38.8% 
42.2% 
48.8% 



33.3% 
33.8% 
100.0% 
23.8% 

AF0756 heterodisulfide reductase, subunits E and D, putative 31.8% 

AF0506 iron-sulfur binding reductase 38.5% 

AF1773 iron-sulfur binding reductase 33.3% 

AF1998 iron-surfur binding reductase 29.6% 

AF0627 iron-Sulfur cluster binding protein 45-5* 

AF0688 iron-sulfur cluster binding protein 44.8% 

AF1153 iron-sulfur cluster binding protein 27.9% 

AF1185 iron-sulfur cluster binding protein 36.7% 

AF1263 irorvsutfurclusterbinding protein 42.1% 

AF2380 iron-sulfur cluster binding protein 35.3% 

AF2381 iron-sulfur cluster binding protein 34.4% 

AF2409 iron-sulfur cluster binding protein 282% 

AF0076 iron-SUifurclusterbinding protein 32.7% 

AF1461 iron-Bulfurctusterbinding protein, putative 61.0% 

AF1436 iron-sulfur flavoprotein(isf-l) 35.7% 

AF1519 iron-sulfur flavoprotein (isf-2) 56.6% 

AF1896 iron-9ulfur flavoprotein (isf-3) 37,1% 
AF1372 methyl viologen-reducing hydrogenase, 

subunit alpha (vhuA) 39.4% 
AF1374 methytviologen-reducing hydrogenase, 

subunit delta (vhuDJ 41.7% 
AF1373 methylviologen-reducing hydrogenase, 

subunit gamma (vhuG) 38.6% 
AF0157 morybdopterin oxidoreductase, iron-sulfur binding 

subunit 38,6% 

AF0174 molybdopterin oxidoreductase, membrane subunit 26.0% 
AF0175 morybdopterin oxidoreductase, iron-sulfur binding 

subunit 42.0% 
AF0176 molybdopterin oxidoreductase, molybdopterin 

binding subunit 32.6% 



AF0600 
AF1202 



AF2386 
AF0159 

AF2267 
AF0131 
AF2352 
AF1828 
AF0248 
AF0342 
AF0546 
AF0601 
AF1126 



AF1379 

AF0173 
AF0647 
AF0867 
AF0880 
AF1349 
AF0832 
AF0831 
AF1640 
AF2312 
AF0711 
AF0769 
AF1284 
AF2144 
AF1339 



molybdopterin oxidoreductase, iron^ulfur binding 
subunit 

molybdopterin oxidoreductase, membrane subunit 27.9% 
motybdopterin oxidoreductase, iron-sulfur 
binding subunit 36 b% 

molybdopterin oxidoreductase, molybdopterin binding 
subunit 30 ' 1tfc 
molybdopterin oxidoreductase, molybdopterin binding 
subunit 34 ' 6% 
molybdopterin oxidoreductase, iron-sulfur binding 

molybdopterin oxidoreductase. membrane subunit 30.3% 
molybdopterin oxidoreductase, molybdopterin 
binding subunit, putative 
NACKPJH-flavin oxidoreductase 
NAD(P)H-fl8Vin oxidoreductase, putative 
NADH dehydrogenase, subunit 1, putative 
NADH dehydrogenase, subunit 3 
NADH-dependent flavin oxidoreductase 
nlgerylhrin, putative 

nitrate reductase, gamma subunit (nart) 
nitrate reductase, gamma subunit, putative 
P460 cytochrome, putative 
polyferredoxin [mvhB), authentic frameshift 
quinone-reactrve Ni/Fe-hydrogenase B-type 
cytochrome subunit (hydC) 
reductase, assembly protein 
reductase, iron-sulfur binding subunit 
reductase, putative 
aibredoxin(rd-l) 
aibredoxin (rd-2) 
rubrerythrin (rr1 ) 
rubrerylhrin (rr2} 
rubrerythrin (rr3) 
rubrerylhrin (rr4) 
thioredoxin (trx-1) 
thioredoxin (trx-2) 
thioredoxin (trx-3) 
thioredoxin (trx-4) 

ubtquinol-cytochrome C reductase complex, 
subunit VI requiring protein ; 



TCA cycle 

AF1963 aconHase(acn) 

AF1 340 citrate synthase (cilZ) 

AF1098 fumaraae(fum-1) 

AF1099 fumarBse(fum-2| 

AF0647 Isocitiate dehydrogenase, NADP (icd) 

AF1727 malate oxidoreductase (mae) 



671% 
60.3% 
49.1% 
53.4% 
672% 
52.3% 



AF0681 succinate dehydrogenase, flavoprotein subunitA 

(sdhA) 482% 
AF0682 succinate dehydrogenase, Iron-sulfur aubunit B (sdhB)51.3% 



30.9% 
31.4% 
282% 
28.9% 
24.3% 
36.7% 
33,3% 
30.1% 
29.3% 
30.6% 
322% 

29.0% 
30.0% 
28.3% 
33.3% 
692% 
679% 
46.7% 
63.7% 
37.8% 
41.4% 
28.4% 
38.5% 
62B% : 
48.9% 

■ 



Fermentation 

AF1779 2-hydroxyacid dehydrogenase, putative 
AF0469 2-ketogkJtarate ferredoxin oxidoreductaao. 

subunit alpha (korA) 
AF0468 2-ketogtutarate ferredoxin oxidoreductase, 

subunit beta (korB) 
AFO470 2-ketoglutarate ferredoxin oxidoreductase, 

subunit delta (korD) ; > 

AF0471 2-ketoglutarate *0ffeddx|n.cxidoreductase, 

9ubunrt gamma (korG) 
AF2053 2-ketoi9dVBier^e feffledoxin oxidoreductase, 

submit alpha (vorA) 
AF2052 2*efoi8pvai6ara|9 ferredoxin oxidoreductase. 

sutnjnitbeta(vorB) 
AF2064 2-ketoisovalerate ferredoxin oxidoreductase, 

4:* : subunit delta (vorD) 
AF2055 2-ketoisovalefate ferredoxin oxidoreductase, 

' V .. subunit gamma (vorG) 
AF0749 2-oxoacid ferredoxin oxidoreductase, 

subunit alpha (orA) 
AF0750 2-oxoacid ferredoxin oxidoreductase, 

subunit beta (orB) 
AF1286 acetoin utilization protein, putativa 
AF0197 ecetyl-CoA synthetase (acs-1 ) 
AF0366 ecetyl-CoA synthetase (acs-2) 
AF0677 acetyi-CoA synthetase (acs-3) 
AF0975 acetyi-CoA synthetase (acs^} 
AF0976 ace^i-CoAsynthetase{acs-6) 
AF1287 acetyi-CoA synthetase {acs-6) 
AF0024 alcohol dehydrogenase, iron-containing 
AF0339 alcohol dehydrogenase, iron-containing 
AF2019 alcohol dehydrogenase, iron-containing 
AF2389-C acetyi-CoA synthetase, putativa 
AF2389-N a cetyl-CoA synthetase, putative 
AF2101 alcohol dehydrogenase, line-dependent 
AF0023 aldehyde ferredoxin oxidoreductase (aor-1 ) 
AF0077 aldehyde ferredoxin oxidoreductase (aor-2) 
AF0340 a Wehyde ferredoxin oxidoreductase (aor-3 ) 
AF2281 aldehyde ferredoxin oxidoreductase (aor-4) 
AF0006 corrinoid methyttransferase protein (mtaC-1 ) 
AF0011 corrinoid methyKransferase protein (mtaC-2) 
AF0394 D-lactate dehydrogenase, cytochrome-type (did) 
AF0560 formate dehydrogenase (fdhD1 ), authentic frameshift 32.9% 
A P1199 gjutaconate CoA-transferase, subunitA (gctA) 31.9% 
AF1198 glutaconateCoA-transferaae, subunit B (gctB), 

authentic frameshift 
AF1489 indolapyruvate ferredoxin oxidoreductase, 

subunit alpha (iorA) 
AF2030 indolepyruvate ferredoxin oxidoreductase, 

subunit beta (iorB} 
AFO807 L-lactate dehydrogenase, cytochrome-type (UdD) 
AF0856 L-malate dehydrogenase, NADt-dependent(mdhA) 40.1% 
AF2085 oxaloacetate decarboxylase, biotin carboxyl carrier 

aubunit, putative 3&.7°h 
AF2084 oxaloacetate decarboxylase, sodium ion pump subunit 

(oadB) 59e% 
AF1262 oxaloacetate decarboxylase, subunit alpha (oadA) 
AF1701 pyruvate ferredoxin oxidoreductase, 

subunit alpha (porA) 
AF1702 pyruvate ferredoxin oxidoreductase, 

subunit beta (porB) 
AF1700 pyruvate ferredoxin oxidoreductase, subunit delta 

(porD) . 
AF1699 pyruvate ferredoxin oxidoreductase, subunit gamma 

(porG) 50-8* 
Gluconeogenesis 

AF0710 phosphoenolpyruvate synthase (ppsA) 



,.3*6% 

52.3% 

512% 

472% 

40.0% 

412% 

42.7% 

51.6% 

452% 

33.7% 

49.2% 
35.1% 
27.1% 
47.3% 
40.9% 
42,3% 
362% 
34.3% 
362% 
37.4% 
35.7% 
64.8% 
69.3% 
34.8% 
41.1% 
32.6% 
38.4% 
53.0% 
30.7% 
29.5% 
31.9% 



37.0% 



48.1% 



41.1% 
39.4% 



63.3% 



50.3% 



AFQ683 succinate dehydrogenase, subunit C(adhC) 

AF0684 succinate dehydrogenase, subunit D (sdhD) 

AF1639 succinyl-CoA synthetase, alpha subunit (sud>1| 

AF2185 succinyl-CoA synthetase, alpha subunit (sucf>2) 

AF1540 auccinyl-CoA synthetase, beta subunit (sucC-1 ) 

AF2186 succinyl-CoA synthetase, beta subunit (sucC2| 

FATTY ACID AND PHOSPHOLIPID M ETABOLISM 

General 

AF1736 3-hydroxy-3-methylglutarvKoenzyme A reductase 
(mveA) 

AF0017 3-hydroxyacyl-CoA dehydrogenase (hbdVJl 

AF0285 3-hydroxyacyl-CoA dehydrogenase (hbd-2) 

AF0434 3-hydroxyacyl-CoA dahydrogenaae(hi)d-3) 

AF1025 3-hydroxyacyl-CoA dehydrogenase (hbd-4) 

AF1122 S-hydfoxyacyl-CoAdehydrogenasa^bd-e) 

AF1177 3-hydroxyacyl-CoA dehybrpgenase (rftid-6) 

AF11 90 3-hydroxyacyl-CoA dehydrogenase (hbd-7) 

AF1206 3-hydroxyBCyjMpoAdehydro^^ 

AF2017 3-hydraxyaq(£coA dehydrogenase (hbd-9) 

AF2273 3-hydrc«y«cyM>)A dehydrogenase (hbd-10} 

AF0018 3-ketoa<^l-CoA.tr)iof83e (aca 8-1 ) 

AF0034 3^e^.cy1-CpAthlolase (acaB-2) 

AF0133 3-kfltoacyhCoA thiolase (acaB-3) 

AF0134 J*etoacyi-CoA thiolase (acaB4) 

AP0201 . 3-ketOBCyi-CoA thiolase (acaB-6) 

■v: AF0202 3-ketoacyl-CoA thiolase (acaB-6) 

[ : AF0283 3-ketoacy 1-CoA thiolase (aca B-7) 

AF0438 3-ketoacyM3oAtniolase(8caB-8) 

. ■ ■ AF0967 3-ketoacvl-CoA thiolase (aca 

"AF0968 3-ketoacyhCoA thiolase (aca B- 10) 

AF1291 3-ketoacyl-CoA thiolase {aca B-11) 

AF2416 3-ketoacyl-CoA thiolase (flcaB- 12) 

AF1028 3-ketoacyl-CoA thiolase (fedA-1 ) 
AF1197 3-ketoacy!-CoA thiolase (fadA-2) 
AF2243 3-ketoacyl-CoA thiolase (fadA-3) 
AF0033 acy! carrier protein synthase (acsA-1 ) 
AF2415 acyl carrier protein synthase (acaA-2) 
AF0199 acyl-CoA dehydrogenase (acd-1) 
AF0436 acyl-CoA dehydrogenase (scd-2) 
AF0498 acyl-coA dehydrogenase (acd-3) 
AF0671 acyl-CoA dehydrogenase (ac<M) 
AF0845 acyl-CoA dehydrogenase (acd-5) 
AF0964 acyK^iA dehydrogenase (acd-6) 
AF1026 acyl-CoA dehydrogenase (acd-7) 
AF1141 acyl-CoA dehydrogenase (acd-8) 
AF1293 acyl-CoA dehydrogenase (acd-9) 
AF2057 acyl-CoA dehydrogenase (acd- 10) 
AF2244 acyl-CoA dehydrogenase (acd-11 ) 
AF2275 acyl-CoA dehydrogenase (acd-12) 



53.1% 



61.4% 



Grycofysia 

AF1146 3-phosphogiycerate kinase (pgk) 48.8% 

AF1132 enolase(eno) 63.9% 

AF1 732 glyceraldehyde 3-phosphate dehydrogenase (gap) 56.6% 

AF1304 triosephosphate isomerase (tplA) 56.4% 
Pentose phosphate pathway 



AF0943 


ribose 5-phosphate isomerase (rpi) 


48.9% 


Sugars 




31.3% 


AF0356 


carbohydrate kinase, pfkB family 


AF0401 


carbohydrate kinase, pfkB family 


34.1% 


AF1324 


carbohydrate kinase, FGGY family 


27.1% 


AF1752 


carbohydrate kinase, FGGY famiiy 


29,3% 


AF0861 


D-arabino 3-hexuiose 6-phosph ate formaldehyde 






lyase (hps-1 ) 


30.6% 


AF1306 


D-arabino3-hexulose6-phosphate 






formaldehyde lyase (hps-2) 


44.2% 


AF0480 


fuculose-1 -phosphate aldolase (fucA) 


318% 



36.6% 
25.9% 
66.9% 
83.6% 
61.3% 
490* 



BW% 

41.1% 

658% 

40.7% 

46.6% 

452% 

36.8% 

46.6% 

363% 

36.4% 

39.4% 

41.0% 

38.3% 

32.3% 

32.6% 

26.9% 

33.5% 

42.0% 

42.4% 

33.7% 

28.0% 

40.1% 

49.9% 

38.8% 

472% 

403% 

28.6% 

58,7% 

35.9% 

44.1% 

22.9% 

37.9% 

44.6% 

35.8% 

42.6% 

432% 

458% 

44.6% 

42.6% 

38,9% 



AF1175 acyl-CoA dehydrogenase, short chain-specific (acdS) 30.1% 

AF0818 acylphosphatase (acyP) 

AF0868 alkyldihydroxyacetonephosphate synthase 

AF2286 bif unctional short chain isoprenyl diphosphate 

synthase (idsA) 

AF0220 biotin carboxylase (acc) 

AF0865 carboxylesterase (est-1 ) 

AF1537 carboxylesterase (est-2) 

AF2336 carboxylesterase(est-3) 

AF1716 carboxylesterase (astA) 

AF1744 CDP-diacylglycerol-glyceroi-3-phosphate 3- 

phosphetidyltransfersse (pgsA-2) 

AF1143 CDP-diacylglycerol-glycerol-3-phosphate-3- 

phosphatidyltransterase (pgsA-1 ) 

AF2044 CDP-diacylglyceroi-serina O-phosphatidyltransferase 
(pssA) 

AF0435 enoyl-CoA hydratasa (tad-1 ) 

AF0686 enoyl-CoA hydratase (f ad-2 ) 

AF0963 enoyl-CoA hydratase (fad-3) 

AF1641 enoyl-CoA hydratasa (fad-4) 

AF2429 enoyl-CoA hydratase (fad-6) 

AF1763 lipase, putative 

AF0089 long<hain-fatty-acid-CoA ligase (fadD-1 ) 

AF0200 long-chain-fatty-acld-CoA ligase (1adD-2) 

AF0687 iong-chain-fatty-acid-CoA ligase (fadD-3) 

AF0840 Iong-chain-fatty-acid-CoA ligase (fadD-4) 

AF1029 Iong-chain-fatty-acid-CoA ligase (fadD-5) 

AF1510 Iong-chain-fatty-acid-CoA ligase (fadf>6) 

AF1772 Iong-chain-fatty-acid-CoA ligase (fadD-7) 

AF1932 long-chaMatty-acid-CoA ligase (fadr>B} 

AF2368 Iong-chain-fatty-acid-CoA ligase (fadD-9) 

AF1753 lysophosphoiipase 

AF0196 medium-chain acyt-CoA ligase (alkK-1 ) 

AF0262 medium-chain acyFCoA ligase (alkK-2) 

AF0672 medium-chain acyl-CoA ligase (alkK-3) 

AF1261 medium-chain acyl-CoA ligase (alkK-4) 

AF2033 medium-chain acyl-CoA ligase (alkK-5) 

AF2289 mevalonatekinase(mvk) 

AF1794 myo-inositol-1-phosphate synthase (inol) 

AF2045 phosphatidylserine decarboxylBae (psd2 ) 

AF1674 sn-glycerol-1-phosphate dehydrogenase (gldA) 



36.8% 
33.6% 

42.7% 
59.1% 
27.1% 
29.0% 
30.4% 
40.4% 

26.7% 



47.6% 
39.9% 
48.6% 
41.7% 
34.7% 
33.5% 
31.9% 
34.8% 
31.1% 
38.1% 
37,8% 
36.0% 
38.7% 
31.0% 
38.7% 
33.5% 
34.6% 
36.6% 
31.0% 
42.7% 
33.5% 
40.6% 
322% 
42.5% 
44.0% 



50.7% AUTOTROPHIC METABOLISM 



Genera! 

AF1100 acetyFCoA decarbonylase/synthase, subunit alpha 
(cdhA-1) 

AF2397 acetyi-CoA decarbonyiase /synthase, subunit aipha 
(cdhA-2) 

AF0379 acetyl-CoA decarbonylase/synthase, subunit beta 
(cdhC) 

AF0377 acetyl-CoA decarbonylase/synthase, subunit delta 
(cdhD) 

AF1101 acetyFCoA decarbonylase/synthase, subunit epsilon 
(cdhB-1} 

AF2398 acetyFCoA decarbonylase/synthase, subunit epsilon 
(cdhB-2) 

AF0376 acetyl-CoA decarbonylase/synthase, 

subunit gamma (cdhE) 
AF1849 carbon monoxide dehydrogenase, catalytic subunit 

(cooS) 

AFO960 carbon monoxide dehydrogenase, iron sutfur subunit 
(cooF) 

AF1535 ferredoxin-thioredoxin reductase, catalytic subunit 
(ftrB) 

AF2073 formylmethanofuranletrahydromethanopterin 

formy (transferase (ftr-1) 
AF2207 formylmethanofuranletrahydromethanopterin 

formyltransferase(ttr-2) 



50.4% 
54.0% 
62.7% 
57.4% 
40.0% 



38.9% 
38.6% 
46.0% 
68.4% 
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AF1935 NS.NIO-methenyltetrahydromeihanopierin 

cydohydrolase(mch) 97 -3* 

AF07 1 4 N5,N 1 CNnethylenetetrahydromethanopterin 

dehydrogenase (mtd) 61 

Af 1066 N6.N10-rnethytefietetrahydrornethanopterinredud89a 

(mer-1) . J 59 ' 1 * 

AF1196 NS.NIQiTiethyleneiet/ahydrOTethanopterin reductase 

<mer-2> „ 374% 

AF0009 N5-methyltet/8hydromethanopterlmcoen*yme M 

methyltransferase (mtr) 
AF1567 ribulose bisphosphate carboxylase, large Bubunrt 

<rbcL-1> «** 
AF1638 ribulose bisphosphate carboxylase, large subunit 

(fbcL-2} 44 - 9% 
AF1930 tungsten formylmethanofuran dehydrogenase. 

subunit A (fwdA) 
AF1 660 tungsten formylmethanofuran dehydrogenase, 

subunit B(twd8-1) 37 -°* 
AF1929 tungsten formylmethanofuran dehydrogenase, 

aubunitB(fwdB-2) 4 9- 4<fc 
AF1931 tungsten formylmethanofuran dehydrogenase, 

subunit C(fwdC) 441 * 
AF1661 tungsten formylmethanofuran dehydrogenase, 

subunit D(fwdD-1) 32 - 6 * 
AF1928 tungstenformylmethanofurandehydrogenase, 

subunit D(fwdD-2) 526 * 
AF0177 tungsten formylmethanofuran dehydrogenase, 

subunit E(fwdE) 297<fc 
AF1 644 tungsten formylmethanofuran dehydrogenase, 

subunit FffwdF) 3 s - 2 * 
AF1649 tmgstenfoniiylmethanofuren dehydrogenase, 

subunit G(fwdG) 
PURINES. PYRIMIDINES, NUCLEOSIDES, AND NUCLEOTIDES 
S'-Deoxyribonucleotide metabolism 
AFl 108 deoxycytiriine triphosphate deaminase, putative 
AF1664 ribonucleotide reductase (nrd) 
AF1554 thioredoxin reductase (trxS) 
AF2047 thymidylate synthase, putative 

Nucleotide and nucleoside interconversions 

AF0876 S'-nucleotidasefntS) 

AF0676 adenylate kinase (adk) 

AF1900 cytidylate kinase (cmk) 

AF0767 nucleoside diphosphate kinase (ndk) 

AF0061 thymidylate kinase (tmk) 

AF1 308 thymidylate kinase, putative 

AF2042 uridylate kinase (pyrH) 

Purine ribonucleotide biosynthesis 

AF2242 adenylosuccinate lyase (purB} 

AF0841 adenylosuccinate synthetase (purA) 

AF0873 amidophosphofibosyttfansferasefpurF) 

AF0253 GMP synthase (guaA-1) 

AF1320 GMP synthase (gusA-2) 

AF181 1 tnosine monophosphate cyclohydrolase 

AF0847 inosine monophosphate dehydrogenase (guaB-1 ) 

AF2118 inosine monophosphate dehydrogenase (guaB-2) 

AF1259 inosinemonopriosphBtedehydrogenase, putative 

AFl 157 phosphoribosylamine-glycine ligsse (porD) 

AF1271 phosphoribosylflminoimldazoie carboxylase (purE) 42.8% 

AF1272 phosphoribosylaminoimidazolesuccinocarboxBmtdB 

synthase (purC) 34 ■ 7 *i|| 

AF1693 phosphoribosyrformylglycinamidinecyclcHigase ..4;!%, 

(purM) -::!: : :. 53.8% 

AF1260 phosphoribosytformylgiycinamiciinesyrithasdi(purQ) 409% 

AF1940 phosphoribos^ormylglycinamidines^thase'l^urL) 41.5% 

AF0589 ribose-phosphate pyrophosphokitwselprsA-1 ) ^ 3S.0% 

AF1419 ribose-phosphate pyrophosphokinasefa$A-2)./ 

Pyrimidine ribonucleotide biosynthesis _ .. 

AF0106 aspartate c^rbamoyttransferase, catalytic 
subunit (pyrB) 

AF0107 aspartatecarbamoyttransfereBe; regulatory 
subunit (pyrl) '■■■■^^ 

AF1274 carbamoyl-phoiphate synthase, large subunit (carB) 65.1% 

AF1273 carbamoyl-^oaphate synthase, small subunit(carA) 652% 

AF0252 CTP synthase (pyrG) 

AF2250 dinydfOOrotase(pyrC) 

AF0746 dihy^roofotasa dehydrogenase (pyrD) 

Af%# • orotatephosphoribosyl transferase (pyrE) 

AKQ86 dictate phosphoribosyl transferase, putative 
:'■ Sp^gdktfiudeosides and nucleotides 
* . AFOiP adenine deaminase (adeC) 
k !AFJ764 dCMP deaminase, putative 
\ AF1 788 methylthioadenosine phosphoryiase {mtaP} 
■ ;; AF1341 thymidine phosphoryiase (deoA-1) 

AF1342 thymidine phosphoryiase (deoA-2) 

AF0239 xanthine-guanine phospboribosyltransferase (gptA-1 ) 25.7% 

AF1 789 xanthine-guanine phosphoribosyltransferase (gptA-2) 28.2% 

REGULATORY FUNCTIOMS 

AF1959 (RKiydroxyglutaryi-CoA dehydratase activator (hgdC) 

AF0168 arsenical resistance operon repressor, putative 

AF2204 aryisulfatase regulatory protein, putative 

AFQ074 blotin operon repressor /biotin-[acetyl CoA 

carboxylase] ligase (birA) 

AF1 724 dinitrogenase reductase activating glycohydrolasa 

(draG) 

AF2232 ferric uptake regulation protein (fur) 

AF1785 iron-dependent repressor 

AF2395 iron-dependent repressor 

AF0245 Iron-dependent repressor(desfl) 

AF1984 iron-dependentrepressor(troR) 

AF2430 lacZ expression regulatory protein (ice) 

AF1622 leucine responsive regulatory protein (Irp) 

AF0673 mercuric resistance operon regulatory protein (merR) 

AF2425 methanol dehydrogenase regulatory protein (moxR) 

AFl 475 mitochondrial benzodiazepine receptor/ sensory 

transduction protein 

AF0198 monoamine oxidase regulatory protein , putative 

AF1933 monoamine oxidase regulatory protein, putative 

AF0978 nitrogen regulatory protein P-ll (glnB-1 ) 

AF1747 nitrogen regulatory protein P-ll (glnB-2) 
AFl 760 nitrogen regulatory protein P-ll (glnB-3) 
AF0331 pheromone shutdown protein (traB) 
API 797 phosphate regulatory protein, putative 
AF0521 protease synthase and sporulation regulator Pail, 

putative 
AF1627 repressor protein 
AFl 793 repressor protein 
AF0449 response regulator 
AF1083 response regulator 
AF1266 response regulator 
AF1384 response regulator 
AFH73 response regulator 
AF1898 response regulator 
AF2249 response regulator 
AF2419 response regulator 



38.1% 
59.7% 
45.2% 
33.1% 

30.9% 
56.1% 
48.6% 
56.4% 
34.9% 
26.3% 
53.6% 

523% 
70.8% 
56.8% 
59,6% 
49.4% 
38.3% 
41 .6% 
31.9% 
51.6% 
40.9% 



'41.1% 



48.2% 



AF0004 
AF0021 
AFO208 
AF0450 
AF0770 
AF0893 
AF1184 
AF1462 
AFl 467 
AF1472 
AF1483 
AF1516 
AF1639 
AF1721 
AF2109 
AF0881 

AF0277 
AF0410 
AF0448 
AF1620 
AF2032 
AF2420 
AF0442 
AF1516 
AF1270 
AF1544 
AF1853 
AF2136 
AF0439 
AF0474 
AF0584 
AF1121 
AF1148 
AFl 404 
AF1448 
AFl 723 
AF1743 
AF2127 
AF0114 
AF1968 
AF0112 
AF1676 
AF1817 
AF0363 
REPLICATION 



RNaseLinhibitor 

signal-transducing histidine kinase 
signal-transducing histidlne kinase 
signal-transducing histldine kinase 
signal-transducing histidlne kinase 
signal-transducing histidlne kinase 
signal-transducing hlstidine kinase 
signal-transducing histidine kinase 
signaMransductng histldine kinase 
signal-transducing histidina kinase 
signal-transducing histidine kinase 
signal-transducing histidine kinase 
aignal-transducing histidine kinase 
signal-transducing histidlne kinase 
signal-VBnsducing histidine kinase 
signaMran8ducing histidine kinase, 
authentic frameshift 

signal-transducing histidine kinase, putative 
signal-transducing histidine kinase, putative 
signal-transducing histidine kinase, putative 
signal-transducing histidine kinase, putative 
signal-transducing histidine kinase, putative 
signal-transducing histidine kinase, putative 
succinoglycan biosynthesis regulator (exsB) 
sugar fermentation stimulation protein (stsA) 
transcriptional regulatory protein. ArsR family 
transcriptional regulatory protein, ArsR family 
transcriptional regulatory proteia ArsR family 
transcriptional regulatory protein, ArsR family 
transcriptional regulatory protein, AsnC family 
transcriptional regulatory protein. AbdC family 
transcriptional regulatory protein, AsnC family 
transcriptional regulatory protein, AsnC family 
transcriptional regulatory protein. AsnC family 
transcriptional regulatory protein, AsnC family 
transcriptional regulatory protein, AsnC family 
transcriptional regulatory protein. AsnC family 
transcriptional regulatory protein, AsnC family 
transcriptional regulatory protein, LysR family 
transcriptional regulatory protein, putative 
transcriptional regulatory protein, Rok family 
transcriptional regulatory protein, Sir2 family 
transcriptional regulatory protein, Sir2 family 
transcriptional regulatory protein, TetR family 
transcriptional repressor (cinR) 



64,5% 
26.1* 
27.9% 
32.4% 
26.9% 
28.7% 
29.8% 
28.6% 
37.4% 
30.4% 
27.7% 
32.0% 
29.9% 
34.6% 
31.6% 

26.5% 
29.8% 
27.1% 
28.1% 
262% 
22.6% 
28.4% 
37.2% 
31.0% 
35.4% 
32.3% 
34,9% 
398% 
29.8% 
51.0% 
35.3% 
35.8% 
32 6% 
45.1% 
30.6% 
46.4% 
34.9% 
30.8%; 
35£% 4 
32.9% ' 
38.9% 
JS 40.8% 
24.5% 
-.27£* : 



58.3% 
372% 
44.8% 
49.0% 
39,0% 

39.5% 
39.0% 
40.0% 
46.7% 
40.7% 



512% 
36.7% 
29.9% 

36.6% 

37.9% 
25.8% 
42.0% 
40.0% 
282% 
28.3% 
29.6% 
29.1% 
37.6% 
48.3% 

38.4% 
41.7% 
38.9% 
61.7% 
58.0% 
60.7% 
40.5% 
30.7% 

52.4% 
59.1% 
54.5% 
38.1% 
36.3% 
42.5% 
44.7% 
32.5% 
48.7% 
44.8% 
37.9% 



30.0% 
66.3% 
43.7% 
46.4% 
58.4% 
46.8% 
32.7% 
44.4% 
32.7% 
45.1% 
32.3% 
31.9% 
36.9% 
26.8% 
44.4% 
32.5% 
37.6% 
59.3% 
40.0% 
28.9% 
362% 
39.8% 
43.9% 
44.3% 
415% 

55.3% 
31.4% 
63.6% 
42.0% 
33.7% 
30.2% 
40.7% 
39.3% 

63.0% 



DNA replication, restriction, modification, recombination, and repair 

AF2117 3-mflthyladenineDNAglycosylasaieJkA) * 

AF2060 activator 1 , replication factor C. 35 KDa subunit 

AF1195 activator 1, replicationfa'apr.C. 53 KDa subunit 

AF0465 DNA gyrase, subunit A (gyrA| 

AF0530 DNA gyrase. subunit B(gyrB) 

AF1388 DNA helicase, putative 

Al 1960 DNA heHcase, putative 

AF0623 DNA Ngaae (lifl) 

AF1725 DNA ligase, putative 

AF0497 . DN^pOfymerasa 61 (polB) 

A£06^3 ONA polymerase B2 (boxA), authentic frameshift 

• AF0972 DNA polymerase III, subunit epstton (dnaQ) 

!: A>2277 DNA polymerase, becteriophage-type 

AF0742 DNA primase, putative 

AF0264 DNA repair protein RAD2 (rad2| 

: AF0358 DNA repair protein RAD25 

AF10Q1 DNA repair protein RAD32 (rad32) 

AF0993 DNA repair protein RAD51 (radAJ 

AF2096 DNA repair protein REC 

AF2418 DNA repair protein, putative 

AF1806 DNAtopoisomeraselftopA) 

AF0940 DNA topoisomerase VI, subunit A ftop6A) 

AF0652 DNA topoisomerase VI, subunit B (top6B) 

AF1692 endonucieaselilfnth) 

AF0580 exodeoxyribonuclease 111 (xthA) 

AF2314 methylated-DNA-protetn-cysteine 
methvitransferase (ogt) 

AF1409 modification methylase, type 111 R/M system 

AF1234 mutator protein MutT(mutT) 

AF2200 mutator protein MutT, putative 

AF0335 proliferating-cell nuclear antigen (pol30) 

AF0694 replication control protein A, putative 

AF1024 reverse gyrase (top-RG) 

AF0621 ribonuclease Htl (mhB) 

AF1715 type 1 restriction-modification enzyme, W subunit, 
authentic frameshift 

AF1709 type I restriction-modification enzyme, R subunit 38.2% 

AF1710 type I restriction-modification enzyme, S subunit 33.0% 

TRANSCRIPTION 
DNA-depandentRNA polymerase 

AF1 888 DN A-directed RNA polymerase, subunit A'(rpoAI) 63,6% 

AF1889 DNA-directed RNA polymerase, subunit A" (rpoA2) 55.7% 

AF1887 DNA-directed RNA polymerase, subunit 8" (rpoBI ) 65.3% 

AF1886 DNA-directed RNApolymerase.subunitB" (rpoB2) 57,1% 

AF2282 DNA-directed RNA polymerase, subunit D (rpoD) 34.6% 

/VF1117 DNA-directed RNA polymerase, subunit E' (rpoE1) 48.4% 

AF1116 DNA-directed RNA polymerase, subunit E"(rpoE2) 40.0% 

AF1885 DNA-directed RNA polymerase, subunit H(rpoH) 59.5% 

AF113I DNA-directed RNA polymerase, subunit K(rpok) 61.5% 

AFO207 DNA-directed RNA polymerase, subunit L (rpoL) 42.0% 

AF1130 DNA-directed RNA polymerase, subunit N (rpoN) 58.8% 

Transcription factors 

AF1813 TBP-interacting protein TIP49 4S7<fc 
AF1299 transcription Initiation factor II9 &0- 4< * 
AF0373 transcription initiation factor IID 59,4% 
AF0757 transcription initiation factor HE, subunit alpha, putative 23.5% 
AF1 891 transcription termination-antitermination factor NusA, 
putative 

AF1235 transcription-associated protein TFIIS 
RNA processing 

AF1783 dimethyladenosine transferase (ksgA) 
AF2087 fibrillarin (fib) 

AF0482 mRNA 3'*nd processing factor, putative 
AF0532 mRNA3'-end processing factor, putative 
AF2361 mRNA 3'-end processing factor, putative 
AF2399 rRNA methylase, putative 
AF0362 snRNP, putative 
AF0875 snRNP, putative 

TRANSLATION 
Amino acyf tRNA synthetases 
AF2265 BlenyMRNA synthetase (alaS) 
AF0894 arginyKRNA synthetase (argS) 
AF0920 aspartyHRNA synthetase (aspS) 
AF041 1 cysteinyl-tRNA synthetase (cy sS) 
AF0260 glutamyHRNA synthetase (gltX) 
AF0916 gtycyHRNA synthetase (glyS) 
AF1642 histidykRNA synthetase (hisS) 



48.9% 
59.0% 

44.7% 
49,3% 
55.6% 
39.1% 
30.5% 
36.4% 
32.0% 
35.7% 



47.1% 
48.8% 
62.5% 
46.1% 
44.9% 
512% 
46.0% 



AF0633 Isoteucyl-tRNAsynihetase(ileS) 

AF242 1 leucyHRNA synthetase (leuS) 

AF12 16 Ivsyl-tRNA synthetase (lysS) 

AF1463 methionyHRNA synthetase (metS) 

AF1966 phenylalanyMRNA symhetase. subunit alpha (pheS) 

AFl 424 phenylalanyHRNA synthetase, subunit beta (pheT) 

AF1609 prolyHRNA synthetase (proS) 

AF2036 seryHRNA synthetase (serS) 

AF0648 threonyHRNA synthetase (thrS) 

AF1694 tryptophanyHRNAsynthetase(trpS) 

AF0776 tyrosyHRNA synthetase ftyrS) 

AF2224 vetyHRNA synthetase (valS) 

Degradation of proteins, peptides, and gfycopeptides 

AF1976 26S protease regulatory subunit 4 

AF1653 Blkalineserineprotease(aprM) 

AF0678 aminopeptidase, putative 

AF0364 ATP-dependent protease La (Ion) 

AF1946 cysteine proteinase, putative 

AF1281 intracellular protease (pf pf ) 

AF1 1 12 O-sialoglycoprotein endopeptldase (gep) 

AF0666 O-sialoglycoprotein endopeptidase, putative 

AF2086 protease Inhibitor, putative 

AF0490 proteasome, subunHalpha (psmA) 

AF0481 proteasome, subunit beta (psmB) ^ i. 

AF2034 X-pro aminopeptidase (pepQ) 

Protein modification , . . " ! ; : . ■ . ^ " ' ' 

AF0656 antibiotic maturation protefn{pmbA) ''=■■" ' 
AF0378 CODH nickeHnserton accesao/y pro«*n (cooC-1) 
AF1685 CODH nickeHnsertjon accessory protein (cooC-2) 
AF1616 cefaclor modifyjjftg protein (cmo) 
AF2195 deoxyhypu^he :| f^«hasi (dys1-1 ) 
AF23O0 deoxyhyptteine syqth^^ (dys 1-2 ) 
AF0381 dir^lwsyh«\ase(dph5| 
AF2324 fmij andfmv protein 

AF1367 hytfogenaae expression /formation protein (hypA) 
AF1368 hydrogenese expression/formaiion protein (hypB) 
Af 1389 hydrogenase expression/formation protein (hypC) 
: AFl 370 I hydrogenase expression/formation protein (hy pD) 
AF1365 ' hydrogenase expression/formation protein (hypE) 
i. AF1366 hydrogenase expression /formation regulatory 
protein (hypF) 

:. ' AF0036 L-isoaspartyl protein carboxyl methyltransferase 
(pcm-1) 

AF2322 L-isoaspartyl protein carboxyl methyltransferase 
(pcm-2) 

AF1840 methionyl aminopeptidase (map) 
AF1989 peptidyhprolyl cis-trans isomerase (slyD) 
AF0853 proliferating-cell nucleolar antigen P120, putative 
AF2039 proliferating-cell nucleolar antigen P120, putative 
AF1449 pyruvate formateHyase 2 (pflD) 
AF1450 pyruvate formate-lyase 2 activating enzyme (pflC) 
AF01 17 pyruvate formate-lyase activating enzyme (act-1 ) 
AF09 1 8 pyruvate f ormatetyase activating enzyme (act-2| 
AF1 330 pyruvate formate-lyase activating enzyme (act-3| 
AF2278 pyruvate formateHyase activating enzyme (act-4) 
AF1961 pyruvate formate-tyase activating enzyme (pftX) 
AF0380 transmembrane oligosaccharyl transferase, putBtrve 
AF0329 transmembrane oligosaccharyl transferase, putative 
ftibosomal proteins: synthesis and modification 
AFl 490 LSU ribosomal protein L1 P (rpl 1 P) 
AF1 922 LSU ribosomal protein L2P (rp!2P) 
AF1925 LSU ribosomal protein L3P(rpl3P} 
AF1924 LSU ribosomal protein L4P(rpl4P) 
AF1912 LSU ribosomal protein L5P (rpl5P) 
AF1909 LSU ribosomal protein L6P(rpl6P) 
AF0764 LSU ribosomal protein L7AE (rpl7AE) 
AF1491 LSU ribosomal protein L10E (rpMOE) 
AF0538 LSU ribosomal protein L1 1 P (rpl 1 1 P) 
AF1492 LSU ribosomal protein L12A (rpl 12A) 
AFl 128 LSU ribosomal protein L13P (rpl13P) 
AF1915 LSU ribosomal protein LI 4P (rpl 14P) 
AF2319 LSU ribosomal protein L15E (rpl15E) 
AF1903 LSU ribosomal protein L15P (rpl15P) 
AF1127 LSU ribosomal protein L18E(rpt18E) 
AF1906 LSU ribosomal protein L18P (rpllSP) 
AF1907 LSU ribosomal protBinL19E (rpl19E) 
AF1529 LSU ribosomal protein L21E(rpl21E) 
AF1920 LSU ribosomal protein L22P (rp!22PJ 
AF1923 LSU ribosomal protein L23P(rpt23P) 
AF0537 LSU ribosomel protein L24A (rpl24A) 
AF0766 LSU ribosomal protein L24E (rpl24E) 
AF1914 LSU ribosomal protein L24P (rpt24P) 
AF191B LSU ribosomal protBin L29P (rpl29P) 
AF1890 LSU ribosomal protein L30E (rpl30E) 
AF1904 LSU ribosomal protein L30P (rp)30P) 
AF2066 LSU ribosomal protein L31E(rpl31E) 
AF1906 LSU ribosomal protein L32E (rpt32E) 
AF0057 LSU ribosomal protein L37AE (rp!37AE) 
AF0874 LSU ribosomal protein L37E (rpl37E) 
AF2067 LSU ribosomal protein L39E (rp!39E) 
AF1430 LSU ribosomal protein L40E (rpMOE) 
AFl 333 LSU ribosomal protein L44E (rplME) 
AF2064 LSU ribosomal protein LXA (rplXA) 
AF0739 ribosomal protein S1 8 alanine acetyltransferase 
AF2303 ribosomal protein S6 modification protein (rtmK) 
AF1133 SSU ribosomal protein S2P(rps2P) 
AF1919 SSU ribosomal protein S3P (rps3P) 
AF1913 SSU ribosomal protein S4E(rps4E) 
AF2284 SSU ribosomal protein S4P (rps4P) 
AF1905 SSU ribosomal protein S6P (rps5P) 
AF0511 SSU ribosomal protein S6E (rps6E) 
AF1893 SSU ribosomal protein S7P (rps7P ) 
AF2152 SSU ribosomal protein S8E (rpsSE) 
AF1910 SSU ribosomal protein S8P (rpsSE) 
AF1129 S SU ribosomal protein S9P (rps9P) 
AF0938 SSUribosoma!pro»inS10P(rps10P) 
AF2283 SSU ribosomal protein S1 IP (rps1 1 P) 
AF1892 SSU ribosomal protein S12P(rps12P) 
AF2285 SSU ribosomal protein S13P (rps13P) 
AF1911 SSU ribosomal protein S14P(rps14P) 
AF0801 SSU ribosomal protein S15P(rps15P> 
AF09 11 SSU ribosomal protein SI 7E (rps 1 7E> 
AF1916 SSU ribosomel protein S1 7P (rps 17P) 
AF2069 SSU ribosomal protein SI 9E (rps19E> 
AF1 921 SSU ribosomal protein S1 9P (rps 1 9P ) 
AF1114 SSU ribosomal protein S24E (rps24E) 
AF1113 SSU ribosomal protein S27AE (rps27AE) 
AF1334 SSU ribosomal protein S27E (rps27E) 
AF0765 SSU ribosomal protein S28E (rps28E) 
AF2320 SSU ribosomal protein S3AE (rps3AE) 
tRNA modification 

AF0588 archaeosine tRNA-ribosyltransferBae(tgtA) 
AF1954 Glu-tRNAamidotransferase, subunitA(gatA-l) 
AF2329 Glu-tRNA amidotransferase, subunit A (gatA-2) 
AF1440 Glu-tRNA amidotransferase, subunit B (gatB-1 ) 
AF21 16 Glu-tRNA amidotransferase, subunit B (gatB-2) 



49.7% 
43.6% 
462% 
44.4% 
42.6% 
56.8% 
45.4% 
46.9% 
62.4% 
57.6% 
64.6% 

66.0% 
44.5% 
27.8% 
366% 
362% 
66.0% . 
6?JB% 
358% 
37.0% 
60.8% 
S83% 
34,6% 

32.7% 
35.7% 
47.4% 
272% 
32.6% 
34.9% 
40.8% 
40.0% 
40,4% 
54.4% 
40.5% 
46.0% 
51.5% 

46.1% 

60.7% 

59.3% 
48.6% 
34.4% 
35.7% 
442% 
37.8% 
38.8% 
25.5% 
42.3% 
45.8% 
42.5% 
502% 
273% 
29.3% 



48.6% 

60.4% 

56.5% 

56.4% 

51.7% 

53.7% 

60.7% 

45.6% 

673% 

76.0% 

47.4% 

66.7% 

70.3% 

53.8% 

53.8% 

57.8% 

65.5% 

532% 

552% 

55.6% 

61.4% 

66.1% 

57.8% 

44,6% 

41.7% 

55.9% 

50.6% 

512% 

47.6% 

57.9% 

56.9% 

73.3% 

46.8% 

53.8% 

38.5% 

322% 

58.3% 

50.0% 

48.9% 

69.1% 

60.0% 

50.8% 

59.6% 

61.6% 

64.6% 

69.5% 

71 JO* 

71.1% 

74.1% 

52.1% 

615% 

62.0% 

52.6% 

69.0% 

642% 

60.9% 

402% 

60.0% 

49.0% 

55,6% 

38.9% 



52.0% 
38.6% 
53.5% 
64.7% 
46.4% 



Nature ©Macmillan Publishers Ltd 1997 



AF2326 Glu-tRNAamfdotransf erase, subunit C(gatC) 

AF081 6 N2,N2-dimethylgijanosine tRNA methy Itransfer as* 
(trml) 

AF1730 pseudourictylate synthase I (truA) 

AF1485 queuinetRNA-ribosyltransferaseftglB) 

AF0493 ribonuclease PH (rph) 

AFO900 tRNAintronendonuclease(erviA) 

AF2156 tRrMnudeotidyltransferase(cca) 

Translation factors 

AF2360 ATP-dependent RNA helicese HepA, putative 

AF2254 ATP-dependent RNA helicBs*,DEAWamily(deaD) 522% 

AFOOT 1 ATP-dependent RNA helicase, putative 

AF1 468 ATP-dependent RNA helicase, putative 

AF2406 ATP-dependent HNAhelicase, putative 

AF1 149 targe heN case- related protein (lhr-1 ) 

AF2177 large helicase-related protein (lhr-2), authentic 
frameshift 

AF1220 peptide chain release factor eRF, subunit 1 612% 

AF2246 SKI2-fBmilyhelicflsa, authentic frameshift 46.7% 

AF0937 translation elongation factor EF-1, subunit alpha (tuf) 74.4% 

AF0574 translation elongation factor EF-1, subunit beta 31 3% 

AF1894 translation elongation factor EF-2(fus) 82.5% 

AF0777 translation initiation factor elF-1 A (eifIA) 67.6% 

AF0627 translation initiation factor elF-2, subunit alpha (erf2A) 51.1% 

AF2326 translation initiation factor elF-2, subunit beta, putative 46.5% 

AF0592 translation initiation factor elF-2, 

subunit gamma (eif2G) 64.4% 

AFO370 translation initiation factor elF-2B, subunit 

delta (eif2BD) _ 53.3% 

AF2037 translation initiation factor eih-2 6, subunit 

delta (e'rf2BD) 57.9% 

AF0645 translation initiation factor elF-6A (erfSA) 50.4% 

AF0768 translation initiation factor IF-2 (infB) 522% 

TRANSPORT AMD BINDING PROTEINS 

Genera! 

AFQ393 ABC transporter, ATP-binding protein 34.6% 

AF0984 ABC transporter, ATP-binding protein 36.2% 

AF1006 ABClransporter.ATP-bindingprotein 36.1% 

AF1018 ABC transporter, ATP-binding protein 57.7% 

AF1021 ABC transporter, ATP-binding protein 37.8% 

AF1136 ABC transporter, ATP-binding protein 39.3% 

AF1139 ABC transporter, ATP-binding protein 382% 

AF1300 ABC transporter, ATP-binding protein 34.1% 

AF1469 ABC transporter, ATP-binding protein 43,6% 

AF1819 ABC transporter, ATP-binding protein 61.1% 

AF1982 ABC transporter, ATP-binding protein 41.3% 

AF2364 ABC transporter, ATP-binding protein 53.5% 

AF1005 ABC transporter, ATP-binding protein, putative 28.7% 

AF1064 ABC transporter, ATP-binding protein, putative 36.0% 

AF1983 ABC transporter, periplasmic binding protein 25.4% 

AF1981 ABC transporter, permease protein 29.9% 

AF1996 sodium- and chloride-dependent transporter 52.5% 



35.1% 






AF1768 


382% 


AF1769 


37,4% 


AF0680 


44,1% 


AF0231 


30.8% 




418% 


AF0232 


43.9% 


AF0981 




AF0979 


315% 


AFO980 
AF0982 


522% 


AF0016 


29.6% 


AF0969 


48.1% 


AF1222 


362% 
34.6% 


AF1608 


56.0% 


AF1606 



Amino acids, peptides and amines 
AF1 766 amino-acid ABC transporter, periplasmic 

binding protein/protein kinase 
AF0222 branched-chain amino acid ABC transporter, 

ATP-binding protein (braM ) 
AF0822 branched-chain amino acid ABC transporter, 

ATP-binding protein (braF-2) 
AF0959 branched-chain amino acid ABC tmnsportor, ATP- 
binding protein (braf -3) v j j. 
AF1 390 branched-chain amino acid ABC transporter, Hi s : 

ATP-binding protein (braF-4) V. : 

AF0221 branched-chain amino acid ABC trsiaportflr, 

ATP-binding protein (braG-1 ) 
AF0823 branched-chain amino ecid ABC transporter, 

ATP-binding protein 0raG-2) i ; 
AF0958 branched-chain amino acid ABCtransporter. 

ATP-binding protein (braG-3) ;: 
AF1 389 branched-chain amino acid ABC transporter, ATP- 
binding prbtern;(braG-4) 
AF0223 branched-chain amino acid ABC transporter, 

periplasmic binding protein (braC-1 ) 
AF0827 brancned-chain amino acid ABC transporter, 

periplasmic binding protein {braC-2) 
AF0962 branched-chain amino acid ABC transporter, 
J - : K, pertpiasmic binding protein (braC-3) 
; AF1391 .: ' branched-chain amino acid ABC transporter, 

periplasmic binding protein (braC-4) 
;' AF0224 branched-chain amino acid ABC transporter, 

permease protein (braf>1) 
AF0825 branched-chain amino acid ABC transporter, 

permease protein { bra D-2) 
AF0961 branched-chain amino acid ABC transporter, 

permease protein (braD-3) 
AF1392 branched-chain amino acid ABC transporter, 

permease protein (braD-4) 
AF0225 branched-chain amino acid ABC transporter, 

permease protein (braE-1 ) 
AF0824 branched-chain amino acid ABC transporter, 

permease protein (braE-2) 
AFO960 branched-chain amino acid ABC transporter, 

permease protein (braE-3) 
AF1 393 branched-chain amino acid ABC transporter, 

permease protein (braE-4) 
AF1612 cationic amino acid transporter (cat-1 ) 
AF1774 cationic Bmino acid transporter (cat-2) 
AF1770 dipeptide ABC transporter, ATP-binding protein (dppD) 47,8% 
AF1771 dipepiideABCtransporter.ATP-binding protein(dppF) 43.1% 
AF1 767 dipeptide ABC transporter, dipeptide-binding 



27.4% 



42.7% 



. : 44.7% 



,37,8%.. 



69.7% 



48.2% 



42.9% 



34.1% 
64.6% 



34.3% 



26.6% 



60.1% 
25.4% 



23.9% 

65.4% 

28.7% 

313% 

30.1% 

60.5% 
29.6% 
38.0% 



protein (dppA) 33-1* 
dipeptide ABC transporter, permease protein (dppB) 39.3% 
dipeptide ABC transporter, permease protein (dppC} 40.8% 
glutamine ABC transporter, ATP-binding protein (glnQ) 63.8% 
glutamine ABC transporter, periplasmic glutamine- 
btnding protein (gin H) 38.0% 
glutamine ABC transporter, permease protein (glnP) 39.3% 
osmoprotection protein (proV) 39-0% 
osmoprotection protein (proW-1 ) 32.8% 
osmoprotection protein (proW-2) 36,9% 
osmoprotection protein (proX) 28 .7% 
proline permeaaa(putP-l) 262% 
proline permease(putP-2) 27.4% 
proline permease (putP-3) 27.0% 
spermidhe/putrescine ABC transporter, ATP- 
binding protein (potA) 502% 
spermidine/putrescine ABC transporter, periplasmic 
spermidine/ptrtrescine-binding protein (potD), 
authentic frameshift 31.EPh 
AF1607 spermidine/putrescJne ABC transporter, permease 

protein (potB) 38.0% 
AF1 606 spermidine/putrescine ABC transporter, permease 

protein (potC) 38.7% 

Anions 

AF2308 araenite transport protein (arsB) 27.3% 

AF1415 chloride channel, putative 27 3% 

AF0026 cyanate transport protein (cynX) 24,6% 

AF0087 nitrate ABC transporter, ATP-binding protein (nrtf>1 ) 47.4% 

AF0638 nitrate ABC transporter, ATP-binding protein (nrtC-2) 55.5% 

AFO640 nitrate ABC transporter. ATP-binding protein, putative 32.5% 

AFO086 nitrate ABC transporter, permease protein (nrtB-1 ) 35.4% 

AF0639 nitrate ABC transporter, permease protein (nrtB-2) 37.4% 
AF1359 phosphateABC transporter, ATP-binding 

protein (pstB) 66.0% 
AF1356 phosphate ABC transporter, periplasmic phosphate- 
binding protein (phoX) 25.1% 
AF1358 phosphate ABCtransporter, permease protein (pstA) 34,1% ; 
AF1357 phosphate ABCtransporter, permease protein (pstC) 33.7% ■ 
AF1 360 phosphate ABCtransporter. regulatory protein (phoU) 269% 
AF0791 phosphate permease, putative 311% ' 1 
AF1798 phosphate permease, putative 62.9% 
AFO092 sulfate ABC transporter, ATP-binding protein IpyaA) 642% : 1 
AFO093 sulfate ABC transporter, permease protein (cysT) 44.1 % 

Carbohydrates, organic alcohols, and adds 

AF0347 C4-dicarboxylate transporter (mBel )l 24.5% 
AF1426 glycerol uptake fadirtator, MlP %annel (glpF) 362% 
AFO013 hexuronate transporter (exuT) =pi, 25.1% 
AF0806 L-lactateparmeasefJctP) 31-7% 
AFOOOfi oxal8W/fomiate:at>tiporti9r(oxrT-1} 25-7% 
AF0367 oxalate/forr^teahliporter(oxlT2} 332% 
AF1069 pantotheriBtepi6rm6ase(panF-l) 26.9% 
AF1205 pantothenate permease (panF2) 24.8% 
AF0237 i^8nt6^nafep#mease(panF-3) 25.1% 
AFO041 pd^acoharideABC transporter, ATP-btnding 

.■^ protein (ribB-1) 42.5% 
AFO290 polysaccharide ABC transporter, ATP-binding protein 
■ ■ ... ,{r1bB-2) 43.9% 

: AFO042 polysaccharide ABC transporter, permease protein 

(rft»A-1) 27.5% 

■'■= AF02B9 polysaccharide ABC transporter, permease protein 

(rfbA-2) 28.5% 
AF0887 ribose ABC transporter, ATP-binding protein (rbsA-1) 33.3% 
AF1170 ribose ABC transporter, ATP-binding protein (rbsA-2) 27.9% 
AF0888 ribose ABCtransporter, permease protein (rbsC-1) 24,1% 
AF0889 ribose ABCtransporter, permease protein (rbsC-2) 312% 
AF2014 sugar transporter, putative 26.0% 



Cations 
AF0977 
AF1746 
AF1749 
AF0473 
AF0152 
AF0246 
AF2394 
AFC661 
AF0430 
AF0432 
AF1401 
AF1397 

AF0431 
AF1402 
AF0786 
AF0346 

AF0217 
AF1245 
AF0846 
AF0715 
AF1673 
AF2197 
AF0218 



AF2268 multidrug resistance protein 313% 
OTHER CATEGORIES 
Adaptations and atypical conditions 

AF06O6 ethylene-inducible protein 74.5% 

AF0235 heat shock protein (htpX) 32.9% 

AF0942 surE stationary-phase survival protein (surE) 502% 

AF1996 virulence associated protein C(vapC-1) 60.0% 

AF1690 virulence associated protein C (vapC-2 ) 30.0% 

Drug and analog sensitivity 

AF1884 daunorubicin resistance ATP-binding protein (drrA) 47.1% 

AF1883 daunorubicin resistance membrane protein (drrB) 27X3% 

AF0487 penicillin G acytase 31.7% 

AF1214 phanylacryHc acid decarboxylase (padl) 432% 

AF2194 rRNA{adenlne-N6Hmethyttrensferase, putative 292% 

AF1696 small multidrug export protein (qecE) 390% 



ammonium transporter {amt-1 ) 44,3% 

ammonium transporter (amt-2) 49.0% 

ammonium transporter (arnt-3) 41,6% 

cation-transporting ATPase, P-type (pacS) 44.0% 

copper-transporting ATPase, P-type (copB) 44.5% 

iron (11} transporter (feoB-1) 33.3% 

iron (II) transporter (f eoB-2) 48.0% 

iron (II) transporter (1eoB-3). authentic frameshrft 29.4% 
iron (III) ABC transporter, ATP-binding protein (hemV-1 ) 50.4% 
Iron (III) ABC transporter, ATP-binding protein (hamV-2) 68.7% 
iron (111) ABCtransporter, ATP-binding protein (hemV-3) 35.2% 
iron (III) ABC transporter, periplasmic hemirvbinding protein 

(hemT), authentic frameshift 282% 

iron (III) ABC transporter, permease protein (heml>1) 362% 

iron (III) ABC transporter, permease protein (hemU-2) 352% 

magnesium and cobalt transporter (corA) 40.1% 
mercuric transport protein periplasmic 

component (merP) 36.2% 

Na+/H+antiporter(napA-t) 282% 

Na+/H+antiporter(napA-2) 28,4% 

Na+/H*antiporter(nhe2) 33.1% 

potassium channel, putative 39.5% 

potassium channel, putative 36,3% 

potassium channel, putative 24.6% 

TRK potassium uptake system protein (trkA-1 ) 302% 

TRK potassium uptake system protein (trkA-2) 42.9% 

TRK potassium uptake system protein (trkH) 39,8% 



Transposortrelated functions 

AF0120 insertion sequence ISH SI, authentic frameshift 
AF0193 ISA0963-1 , putative irartsposase, Buthentjp frameshrft 
AF03O9 ISA0963-2,putativetransposase ^yW^ 
AFi 310 ISA0963-3. putative transposase || 
ISA0963-4. putative transposase 
ISA0963-6, putative transposase 
ISA0963-6, putative trairisppsase 
ISA0963-7, putative transposase, authentic frameshift 



343% 
343% 



AF1383 
AF1410 
AF1706 
AFI636 

AF0678 ISA1083-1.ISORF2 

AF0879 ISA1083-1,put8tiVfltrBnsposaSe 

AF1351 ISA1 083-2, ISORF2 

AF1 352 ISA108W, pufatrve transposase 

AF2140 ISA10e3^.ISORF2 

AF2139 J£iA10B3-3. putative transposase 

AF0278 (SAi214r1 t ISORF2 

Af 02 79 ISA1214-1 , putative transposase 

AF0305 ISA1214-2,ISORF2 

AS 0306 lSA1214-2,putativetransposase 

: AF0641 ISA1214-3.ISOHF2 

AF0642 ISA1214-3, putative transposase 

: #0857 ISA121W.ISORF2 

AF0858 ISA1 21 4-4, putative transposase 

AF2091 ISA1214^,lSORF2 

AF2092 ISA1214-6, putative transposase 

AF2223 ISA1 214-6, ISORF2 

AF2222 ISA1214-6,putativetransposase 

AF0138 transposase IS24&A 

AF0895 transposase IS240A 

AF2390 transposase, authentic frameshift 

AF0137 transposase. putative 

AF1628 transposase, putative 



33.6% 
20.0% 
33.6% 
372% 
30,8% 
31.5% 
30.8% 
31.5% 
27.7% 
33.3% 
27,7% 
33.3% 
26.5% 
33.3% 
27.7% 
33.3% 
26.5% 
33.3% 
26,6% 
25.6% 
43.3% 
462% 
24.0% 
29.6% 
32.8% 

35.0% 
39.5% 
30.9% 
34.4% 
30.8% 
29.9% 
312% 
21.7% 
312% 
49.4% 
42.5% 
28.0% 
34.1% 
29.4% 



Otrter 

AF0834 
AH 980 
AFT 144 
AF1325 



ferritin, putative 

heme exporter protein C (helC( 
multidrug resistance protein 
multidrug resistance protein 



39.8% 
29.0% 
29,2% 
29.9% 



UNKNOWN 

AF0477 AAA supertamily ATPase 

AF0613 allene oxide synthase, putative 

AF0478 ATP-binding protein PhnP (phnP) 

AF1775 etrazinechlorohydrolase, putative 

AF0973 bile acid-inducible operon protein F (baiF-1 ) 

AF0974 bile acid-inducible operon protein F (baiF-2) 

AF1315 bile acid-inducibie operon protein F (baiF-3) 

AF2063 c-myc binding protein, putative 

AF1992 calcium-binding protein, putative 

AF2287 carotenoid biosynthetic gene ERWCRTS, putative 

AF0512 chloropiast inner envelope membrane protein 

AF2251 competence-damage protein, putative 

AF0090 dehydrese 

AF1 498 dehydrase, putative 

AF1518 DNA/pantothenate metabolism flavoprotein, putative 51.4% 

AF0O39 dolichol-P-g!ucose synthetase, putative 33.7% 

AF0328 dolichol-P-glucose synthetase, putative 39.0% 

AF0581 dolichol-P-glucose synthetase, putative 27.5% 

AF0569 DR-beta chain MHC class II 37.7% 

AF0383 endonudease III, putative 47.1% 

AF1150 erpK protein, putative 54.9% 

AF2372 extragenic suppressor (suhB) 37.0% 

AF1418 grycerol-3-phosphatecytidyrtrans1erase(tatiD) 56,6% 

AF0744 GTP-binding protein 33.4% 

AF1181 GTP-binding protein 36.3% 

AF1364 GTP-binding protein 67.5% 

AF2146 GTP-binding protein 66.9% 

AF0428 GTP-binding protein, GTPI/OBG-famtiy 43.9% 

AF2237 HAM1 protein 31.4% 

AF2211 HIT family protein (hit) 29.6% 

AF0216 L-isoaspartyt protein carboxyi methyltrflnsfersse 

PimT, putative 35.5% 

AF2313 maoC protein (maoC) 43.0% 

AF0429 methyltransferase 43.8% 

AF0186 nifS protein, dass-V aminotransferase (nifS-1) 46.1% 

AF0564 nifS protein, clasa-V aminotransferase (nifS-2) 46.1% 

AF0185 nifU protein (nifU-1 ) 55.6% 

AF0565 nifU protein (nrfU-2) 55.6% 

AF0632 nifU protein (nifU-3) 47.4% 

AF1781 noduiation protein NfeD(nfeD) 33,4% 

AF2269 nucleotide-binding protein 48.7% 

AF2382 nucleotide-binding protein 49.1% 

AF0374 p-nitrophenyl phosphatase (pho2) 31.7% 

AF1978 periplasmic divalent cation tolerance protein (cutA) 31.3% 

AF1662 prepro-subtilisin sendai, putative 35.6% 

AF2021 rod shape-determining protein (mraB) 26.6% 

AF1778 stage Vsporulation protein (spoVG) 43.9% 

AF1970 TPR domain-containing protein 29.0% 

AF2202 tryptophan-soecrfic permease, putative 252% 

AF0816 vtpJ-therm, putative 42.1% 

AF1679 vtpJ-therm, putative 46.1% 
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Emergence of symbiosis in 
peptide self-replication 
through a hypercyclic network 

David H. Lee, Kay Severin, Yohei Yokobayashi 
& M. Reza Ghadiri 

Nature 390, 591-594 (1997) 

Hypercycles are based on second-order (or higher) autocatalysis 
and defined by two or more replicators that are connected by 



another superimposed autocatalytic cycle. Our study describes a 
mutualistic relationship between two replicators, each catalysing the 
formation of the other, that are linked by a superimposed catalytic 
cycle. Although the kinetic data suggest the intermediary of higher- 
order species in the autocatalytic processes, the present system 
should not be referred to as an example of a minimal hypercycle 
in the absence of direct experimental evidence for the autocatalytic 
cross-coupling between replicators. ; □ 
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The pathway for sulphate reduction is incorrect as published: in 
Fig. 3 on page 367, adenylyl sulphate 3 -phosphotransferase (cysC) is 
not needed in the pathway as outlined, as adenylyl sulphate 
reductase (aprAB) catalyses the first step in the reduction of adenylyl 
sulphate. The correct sequence of reactions is: sulphate is first 
activated to adenylyl sulphate, then reduced to sulphite and subse- 
quently to sulphide. The enzymes catalysing these reactions are: 
sulphate adenylyltransferase (sat), adenylylsulphate reductase 
(aprAB), and sulphite reductase (dsrABD). We thank Jens- Dirk 
Schwenn for bringing this error to our attention. □ 
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Agrobacterium tumefaciens is a plant patho- 
gen with the unique ability to transfer a de- 
fined segment of DNA to eukaryotes, where 
it integrates into the eukaryotic genome. This 
ability to transfer and integrate DNA is used 
for random mutagenesis and has been adapt- 
ed into a powerful tool for production of 
transgenic plants, including soybean, maize, 
and cotton (1, 2). A. tumefaciens was identi- 
fied early in the 20th century as the causal 
agent of crown gall disease in plants (3), 
Pathogenesis is initiated when Agrobacte- 
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Hum detects small molecules released by ac- 
tively growing cells in a plant wound. These 
molecules induce a series of virulence (vir) 
genes whose encoded products export the 
single-stranded transferred DNA (T-DNA) to 
the plant cell, where it integrates into the 
genome at an essentially random location. 
Once integrated, T-DNA gene expression al- 
ters plant hormone levels, leading to cell 
proliferation typical of a gall tumor. The T- 
DNA also encodes enzymes for the synthesis 
of opines, a class of nutrient molecules used 
almost exclusively by A. tumefaciens (4-7). 

A. tumefaciens strains fall into three bio- 
vars, which differ in their host range, meta- 
bolic characteristics, relationships with other 
genera in the family Rhizobiaceae, and po- 
tentially their chromosome structure (4-13). 
The taxonomy of the Rhizobiaceae family is 
not without controversy, but we expect that 
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Agrobacterium tumefaciens is a plant pathogen capable of transferring a defined 
segment of DNA to a host plant, generating a gall tumor. Replacing the trans- 
ferred tumor-inducing genes with exogenous DNA allows the introduction of 
any desired gene into the plant. Thus, A. tumefaciens has been critical for the 
development of modern plant genetics and agricultural biotechnology. Here we 
describe the genome of A. tumefaciens strain C58 f which has an unusual 
structure consisting of one circular and one linear chromosome. We discuss 
genome architecture and evolution and additional genes potentially involved 
in virulence and metabolic parasitism of host plants. 
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